Re: nomuch_addresses.py

Subject: Re: nomuch_addresses.py

Date: Wed, 22 Feb 2012 13:07:35 +0000

To: Jesse Rosenthal, Daniel Schoepe, Justus Winter, Philippe LeCavalier, notmuch@notmuchmail.org

Cc:

On Tue, 21 Feb 2012 11:33:38 -0500, Jesse Rosenthal <jrosenthal@jhu.edu> wrote:
> On Tue, 21 Feb 2012 14:53:06 +0100, Daniel Schoepe <daniel@schoepe.org> wrote:
> > On Tue, 21 Feb 2012 09:15:09 -0000, Justus Winter <4winter@informatik.uni-hamburg.de> wrote:
> > The reason I mentioned nottoomuch-addresses at all, is that completion
> > itself is _a lot_ faster (at least for me), compared to
> > addrlookup. According to the wiki, notmuch-addresses.py is even slower
> > than addrlookup, so I thought (and still think) that it was worth
> > mentioning. Of course, one could rewrite the database-generation part in
> > python using the bindings, but I personally don't think it's that
> > necessary.
> 
> I'm not sure what speed comparisons were being used -- I think it was
> Sebastian comparing vala to python. In any case, using
> notmuch_addresses.py to look up a common prefix ("Jes") on a slowish
> computer takes 0.2 seconds. So I'm not sure if the speed is all that
> much of an issue. It might be a question of cache temperature, though --
> it'll probably take longer the first time you run it. Still, even trying
> something out on a cold cache, it seems to be about a second.

The speed comparisons between vanilla notmuch_addresses.py and
nottoomuch-addresses.sh are going to be flawed in that they do different
things. It's comparing apples and oranges.

notmuch_addresses.py looks for matches in the recipients of mails the
user has sent. Nothing else. notmuch_addresses.py filters out multiple
names for one email address using a popularity contest.

AFAICT nottoomuch-addresses.sh scans all the addresses in all the
mails. It has no logic for filtering out multiple names for one email
address, and just returns all matches.

Personally I would like to have best of both worlds, and I'm using a
modified notmuch_addresses.py that matches all the mails I have, and
cleans up the duplicate results. Unfortunately that does have a toll on
performance, taking about a second on my system for typical searches,
cache hot, while nottoomuch-addresses.sh takes less than a tenth of a
second. It is enough to be annoying, I'm afraid. Even so, it's not a
fair comparison because notmuch_addresses.py wasn't designed with this
in mind, and nottoomuch-addresses.sh maintains its own database and does
less.

One just needs to pick the tool that fits the needs best.

BR,
Jani.

Previous message (by thread): Re: nomuch_addresses.py

Thread:

Philippe LeCavalier—nomuch_addresses.py [inbox, unread]
- Jesse Rosenthal—Re: nomuch_addresses.py [inbox, unread]
  - Philippe LeCavalier—Re: nomuch_addresses.py [inbox, unread]
    - Jesse Rosenthal—Re: nomuch_addresses.py [inbox, unread]
    - Daniel Kahn Gillmor—Re: nomuch_addresses.py [inbox, unread]
- Daniel Schoepe—Re: nomuch_addresses.py [inbox, signed, unread]
  - Tomi Ollila—Re: nomuch_addresses.py [inbox, unread]
    - Sebastian Spaeth—Re: nomuch_addresses.py [inbox, signed, unread]
      - Tomi Ollila—Re: nomuch_addresses.py [inbox, unread]
        Philippe LeCavalier—Re: nomuch_addresses.py [inbox, unread]
        Tomi Ollila—Re: nomuch_addresses.py [inbox, unread]
  - Philippe LeCavalier—Re: nomuch_addresses.py [inbox, unread]
  - Justus Winter—Re: nomuch_addresses.py [inbox, unread]
    - Tomi Ollila—Re: nomuch_addresses.py [inbox, unread]
    - Daniel Schoepe—Re: nomuch_addresses.py [inbox, signed, unread]
      - Jesse Rosenthal—Re: nomuch_addresses.py [inbox, unread]
        Jani Nikula—Re: nomuch_addresses.py [inbox, unread]
    - David Bremner—Re: nomuch_addresses.py [inbox, unread]