storing From and Subject in xapian

Subject: storing From and Subject in xapian

Date: Tue, 03 May 2011 20:40:45 -0700

To: notmuch@notmuchmail.org

Cc:

From: Istvan Marko


I have been looking at the I/O patterns of "notmuch search" with the
default output format and noticed that it has to parse the maildir file
of every matched message to get the From and Subject headers. I figured
that this must be slowing things down, especially when the files are not
in the filesystem cache.

So I wanted to see how much difference would it make to have the From
and Subject stored in xapian to avoid this parsing. 

With the attached patch I get a speedup of 2x with cached and almost 10x
with uncached files for searches with many matches.

The attached patch is only intended as proof of concept. I am not
familiar with xapian so I wasn't sure if this kind of data should be
stored as terms, values or data. I went with values simply because I saw
that message-id and timestamp were already stored that way. Perhaps the
data type would be more appropriate since the fields are not used for
searching or sorting. Oh and for some reason I get blank Subject for
about 1% of the matches.


Is there a downside to this approach? The only one I see is that the
xapian db size increases by about 1% but to me the speed increase would
be well worth it.


-- 
	Istvan

Thread: