Re: [PATCH 0/3] Speed up notmuch new for unchanged directories

Subject: Re: [PATCH 0/3] Speed up notmuch new for unchanged directories

Date: Mon, 25 Jun 2012 13:59:15 -0400

To: Sascha Silbe, notmuch

Cc:

From: Austin Clements


On Sun, 24 Jun 2012, Sascha Silbe <sascha-pgp@silbe.org> wrote:
> All the time I thought what makes "notmuch new" so abysmally slow is the
> stat() for each maildir. But as it continued to be slow even after I
> moved most mails out of 'new' (into 'new-20120624'), I strace'd notmuch
> and noticed it listed even unchanged directories, thereby listing and
> iterating over each and every single of the 900k mails in my mail store.
>
> There's still quite some room for further improvements as it continues
> to take several minutes to scan < 100 new mails in changed directories
> containing < 1000 mails in total. Even the rsync run that fetches the
> new mails is faster.

I haven't looked over your patches yet, but this result surprises me.
Could you explain your setup a little more?  How much mail do you have
and across how many directories?  What file system are you using?

I'm also surprised that your new approach helps.  This directory listing
has to be read off disk one way or the other, but listing directories is
the bread-and-butter of file systems, whereas I would think that Xapian
would require more IO to accomplish the same effect.  Does your patch
win because you can specifically list subdirectories out of Xapian,
making the IO proportional to the number of subdirectories instead of
the number of subdirectories and files (even though the constant factors
probably favor reading from the file system)?

I like the idea of these patches, I just want to make sure I have a firm
grip on what's being optimized and why it wins.

Thread: