Re: [PATCH] notmuch: Add "maildir:" search option

Subject: Re: [PATCH] notmuch: Add "maildir:" search option

Date: Tue, 12 Nov 2013 20:47:27 +0100

To: Austin Clements

Cc: notmuch@notmuchmail.org

From: Peter Zijlstra


On Tue, Nov 12, 2013 at 02:31:25PM -0500, Austin Clements wrote:
> > XXX: now I need to go figure out how to do searches like:
> >
> >   subject:PATCH/0
> >
> > which would mandate that PATCH is the first word occurring in the
> > subject. I think the position index holds enough information but I
> > need to look into that and obviously the query parser needs work for
> > this.
> 
> This is a frequently requested feature.  Unfortunately, there are two
> technical hurdles.  One is that the position information actually
> *isn't* enough as it is both because the subject will start at some
> arbitrary term offset that depends on the from and to (and maybe other
> things) and because the Xapian Query structure doesn't expose a way to
> search for a specific term offset (only relative offsets).  The other is
> that we currently use Xapian's query parser, which isn't extensible,
> though I took a stab at a custom query parser long ago and swear that
> one of these days I'll return to it.

Bah, I knew that would end up being more complex :-)

> >  lib/database.cc |  7 ++++---
> >  lib/message.cc  | 40 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 44 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/database.cc b/lib/database.cc
> > index a021bf17253c..53aeaa68954d 100644
> > --- a/lib/database.cc
> > +++ b/lib/database.cc
> > @@ -208,15 +208,16 @@ static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = {
> >      { "thread",			"G" },
> >      { "tag",			"K" },
> >      { "is",			"K" },
> > -    { "id",			"Q" }
> > +    { "id",			"Q" },
> > +    { "maildir",		"XMAILDIR:" },
> 
> No colon.  That is, the term prefix should just be "XMAILDIR".

No, that colon is crucial; see http://xapian.org/docs/omega/termprefixes.html

"X starts a multi-capital letter user-defined prefix. If you want a
prefix for something without a standard prefix, you create your own
starting with an X (e.g. XSHOESIZE). The prefix ends with the first
non-capital. If the term you're prefixing starts with a capital, add a
":" between prefix and term to resolve ambiguity about where the prefix
ends and the term begins."

Since maildir folder names typically start with a capital we need that
':' in between the prefix and value. I tried, my initial versions didn't
have the ':' and reliably didn't work.

> >  };
> >  
> >  static prefix_t PROBABILISTIC_PREFIX[]= {
> >      { "from",			"XFROM" },
> >      { "to",			"XTO" },
> >      { "attachment",		"XATTACHMENT" },
> > -    { "subject",		"XSUBJECT"},
> > -    { "folder",			"XFOLDER"}
> > +    { "subject",		"XSUBJECT" },
> > +    { "folder",			"XFOLDER" },
> 
> Unintentional whitespace change?

Oops yes.

> > +    /* Convert the directory into a maildir path */
> > +    maildir = talloc_strdup(local, directory);
> 
> Add a space before the "(".  (Same thing throughout the rest of the
> patch.)

Urgh, weird coding style but yes you're right, I should've kept it
consistent with the rest of the file. Will fix.

> > +
> > +    /* Strip the maildir "cur", "new" directory entries. */
> > +    i = strlen(maildir);
> > +    if (strncmp(maildir + i - 3, "cur", 3) == 0 ||
> > +	strncmp(maildir + i - 3, "new", 3) == 0) {
> 
> This is unsafe if directory is less than three characters, which I
> believe could happen if the message is in the root mail directory (which
> shouldn't happen with a well-formed maildir, but notmuch doesn't require
> maildir, and, regardless, we should be defensive).
> 
> Also, we have a STRNCMP_LITERAL macro that we often use for comparisons
> with string literals, but I'm good with this, too.

Quite so, I haven't actually seen that, but you're quite right.

> > +	    maildir[i - 3] = '\0';
> > +	    i -= 3;
> > +    }
> > +
> > +    /* Strip trailing '/' */
> > +    while (maildir[i-1] == '/') {
> 
> This is also unsafe if maildir is the empty string.
> 
> Also, add spaces around the "-" (likewise on the next line).

Will fix.

> > +	    maildir[i-1] = '\0';
> > +	    i--;
> > +    }
> > +
> > +    /* Strip leading '/' */
> > +    while (maildir[0] == '/')
> > +	    maildir++;
> > +
> > +    /* Strip leading '.' */
> > +    while (maildir[0] == '.')
> 
> Why strip multiple dots?  (I'm not saying you shouldn't, I'm just
> curious; and it may be worth explaining in the comment.)

No reason, copy paste damage from above mostly.

> > +	    maildir++;
> > +
> > +    /* Replace all remaining '.' with '/' */
> 
> I think this should only happen if there was a leading '.', indicating a
> Maildir++ folder hierarchy.  A lot of people use the "file system"
> Maildir layout, which just consists of nested directories where the
> leaves are Maildirs (e.g., Dovecot's LAYOUT=fs,
> http://wiki2.dovecot.org/MailboxFormat/Maildir#Directory_Structure).  In
> this case, it's perfectly legitimate to have '.'s in folder names and it
> would be confusing and unexpected to translate them to '/'s.  This would
> also make this compatible with MH folders (which notmuch supports,
> though admittedly it's unclear if anyone actually uses them).

Ah, I wasn't aware Dovecot actually supported this Maildir variant --
although I've recently come across this variant someplace and thought it
was odd.

OK, I can make it conditional on the initial leading dot.

I'll respin the patch and try to come up with manpage and test additions
to cover this new functionality.

Thread: