I think this is a great idea. Personally I think this is how folder: should work. I find the semantics of folder: to be useless except where they happen to coincide with the boolean semantics used here. Unfortunately, changing folder: would require versioning the database, which we have only primordial support for right now. Various comments below, though nothing major. Of course, we'd also need some tests and man page updates for this. On Tue, 12 Nov 2013, Peter Zijlstra <peterz@infradead.org> wrote: > Subject: notmuch: Add "maildir:" search option > > The current "folder:" search terms are near useless when you have > recursive folders, introduce a boolean "maildir:" search term to > exactly match the maildir folder. > > Given a Maildir++ layout like: > > ~/Maildir/ > ~/Maildir/cur > ~/Maildir/new > ~/Maildir/tmp > ~/Maildir/.Sent > ~/Maildir/.Sent/cur > ~/Maildir/.Sent/new > ~/Maildir/.Sent/tmp > ~/Maildir/.INBOX.LKML > ~/Maildir/.INBOX.LKML/cur > ~/Maildir/.INBOX.LKML/new > ~/Maildir/.INBOX.LKML/tmp > ~/Maildir/.INBOX.LKML.netdev > ~/Maildir/.INBOX.LKML.netdev/cur > ~/Maildir/.INBOX.LKML.netdev/new > ~/Maildir/.INBOX.LKML.netdev/tmp > ~/Maildir/.INBOX.LKML.arch > ~/Maildir/.INBOX.LKML.arch/cur > ~/Maildir/.INBOX.LKML.arch/new > ~/Maildir/.INBOX.LKML.arch/tmp > > This patch generates the following search index: > > $ delve -a Maildir/.notmuch/xapian/ | ~/s XMAILDIR > XMAILDIR:INBOX > XMAILDIR:INBOX/LKML > XMAILDIR:INBOX/LKML/arch > XMAILDIR:INBOX/LKML/netdev > XMAILDIR:Sent > > Which allows one (me!!1) to pose queries like: > > maildir:INBOX and not tag:list > > to more easily find offlist mail (from people like my family who don't > actually send their stuff over LKML :-). > > Signed-off-by: Peter Zijlstra <peterz@infradead.org> > --- > > XXX: now I need to go figure out how to do searches like: > > subject:PATCH/0 > > which would mandate that PATCH is the first word occurring in the > subject. I think the position index holds enough information but I > need to look into that and obviously the query parser needs work for > this. This is a frequently requested feature. Unfortunately, there are two technical hurdles. One is that the position information actually *isn't* enough as it is both because the subject will start at some arbitrary term offset that depends on the from and to (and maybe other things) and because the Xapian Query structure doesn't expose a way to search for a specific term offset (only relative offsets). The other is that we currently use Xapian's query parser, which isn't extensible, though I took a stab at a custom query parser long ago and swear that one of these days I'll return to it. > > > lib/database.cc | 7 ++++--- > lib/message.cc | 40 ++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 44 insertions(+), 3 deletions(-) > > diff --git a/lib/database.cc b/lib/database.cc > index a021bf17253c..53aeaa68954d 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -208,15 +208,16 @@ static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = { > { "thread", "G" }, > { "tag", "K" }, > { "is", "K" }, > - { "id", "Q" } > + { "id", "Q" }, > + { "maildir", "XMAILDIR:" }, No colon. That is, the term prefix should just be "XMAILDIR". > }; > > static prefix_t PROBABILISTIC_PREFIX[]= { > { "from", "XFROM" }, > { "to", "XTO" }, > { "attachment", "XATTACHMENT" }, > - { "subject", "XSUBJECT"}, > - { "folder", "XFOLDER"} > + { "subject", "XSUBJECT" }, > + { "folder", "XFOLDER" }, Unintentional whitespace change? > }; > > const char * > diff --git a/lib/message.cc b/lib/message.cc > index 1b4637950f8e..45a727a6208f 100644 > --- a/lib/message.cc > +++ b/lib/message.cc > @@ -22,6 +22,7 @@ > #include "database-private.h" > > #include <stdint.h> > +#include <string.h> > > #include <gmime/gmime.h> > > @@ -485,6 +486,8 @@ _notmuch_message_add_filename (notmuch_message_t *message, > notmuch_status_t status; > void *local = talloc_new (message); > char *direntry; > + char *maildir; > + int i; > > if (filename == NULL) > INTERNAL_ERROR ("Message filename cannot be NULL."); > @@ -507,6 +510,43 @@ _notmuch_message_add_filename (notmuch_message_t *message, > /* New terms allow user to search with folder: specification. */ > _notmuch_message_gen_terms (message, "folder", directory); > > + /* Convert the directory into a maildir path */ > + maildir = talloc_strdup(local, directory); Add a space before the "(". (Same thing throughout the rest of the patch.) > + > + /* Strip the maildir "cur", "new" directory entries. */ > + i = strlen(maildir); > + if (strncmp(maildir + i - 3, "cur", 3) == 0 || > + strncmp(maildir + i - 3, "new", 3) == 0) { This is unsafe if directory is less than three characters, which I believe could happen if the message is in the root mail directory (which shouldn't happen with a well-formed maildir, but notmuch doesn't require maildir, and, regardless, we should be defensive). Also, we have a STRNCMP_LITERAL macro that we often use for comparisons with string literals, but I'm good with this, too. > + maildir[i - 3] = '\0'; > + i -= 3; > + } > + > + /* Strip trailing '/' */ > + while (maildir[i-1] == '/') { This is also unsafe if maildir is the empty string. Also, add spaces around the "-" (likewise on the next line). > + maildir[i-1] = '\0'; > + i--; > + } > + > + /* Strip leading '/' */ > + while (maildir[0] == '/') > + maildir++; > + > + /* Strip leading '.' */ > + while (maildir[0] == '.') Why strip multiple dots? (I'm not saying you shouldn't, I'm just curious; and it may be worth explaining in the comment.) > + maildir++; > + > + /* Replace all remaining '.' with '/' */ I think this should only happen if there was a leading '.', indicating a Maildir++ folder hierarchy. A lot of people use the "file system" Maildir layout, which just consists of nested directories where the leaves are Maildirs (e.g., Dovecot's LAYOUT=fs, http://wiki2.dovecot.org/MailboxFormat/Maildir#Directory_Structure). In this case, it's perfectly legitimate to have '.'s in folder names and it would be confusing and unexpected to translate them to '/'s. This would also make this compatible with MH folders (which notmuch supports, though admittedly it's unclear if anyone actually uses them). > + for (i = 0; maildir[i]; i++) { > + if (maildir[i] == '.') > + maildir[i] = '/'; > + } > + > + /* If there's no string left, we're the "INBOX" */ > + if (maildir[0] == '\0') > + maildir = talloc_strdup(local, "INBOX"); > + > + _notmuch_message_add_term (message, "maildir", maildir); > + > talloc_free (local); > > return NOTMUCH_STATUS_SUCCESS; > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch