On Wed, 25 Jan 2017, Tomi Ollila <tomi.ollila@iki.fi> wrote: > On Sat, Jan 21 2017, David Bremner <david@tethera.net> wrote: > >> the idea is that you can run >> >> % notmuch search subject:/<your-favourite-regexp>/ >> % notmuch search from:/<your-favourite-regexp>/ > > I like this interface. FWIW I think this is superior to the earlier alternatives too. I think people would like to use regexps (or globbing) for path: and folder: queries. Is there a risk of ambiguity between normal path: and folder: searches and regexp searches due to "/"? I suppose the normal queries never begin with "/" for them (due to being relative to database path, not absolute) but is that confusing? BR, Jani. > >> >> or >> >> % notmuch search subject:"your usual phrase search" >> % notmuch search from:"usual phrase search" >> >> This should also work with bindings, since it extends the query parser. >> >> This is trivial to extend for other value slots, but currently the only >> value slots are date, message_id, from, subject, and last_mod. Date is >> already searchable, and message_id is not obviously useful to regex >> match. > > Why would not mesasge_id not be useful to regex match. I can come up quite > a few use cases... but if there are techinal difficulties... then that > should be mentioned instead. > > maybe this commit message should inform that xapian with field processors > (1.4.x) is required for this feature -- and emphasize it a bit better in > manual page ? > > Probably '//' is used to escape '/' -- should such a character ever needed > in regex search. > >> >> This was originally written by Austin Clements, and ported to Xapian >> field processors (from Austin's custom query parser) by yours truly. >> --- >> >> This version impliments the use of // to delimit regular expressions. >> I have not tested the code paths with old (pre field processor) xapian. > > Fedora 25 has 1.2.24 -- T630 tests are skipped. It looks like these changes > did not increase the failure count there. > > Some (mostly whitespace nitpicking) comments below: > > >> >> doc/man7/notmuch-search-terms.rst | 27 +++++++- >> lib/Makefile.local | 1 + >> lib/database-private.h | 2 + >> lib/database.cc | 29 +++++++- >> lib/regexp-fields.cc | 142 ++++++++++++++++++++++++++++++++++++++ >> lib/regexp-fields.h | 77 +++++++++++++++++++++ >> test/T630-regexp-query.sh | 82 ++++++++++++++++++++++ >> 7 files changed, 354 insertions(+), 6 deletions(-) >> create mode 100644 lib/regexp-fields.cc >> create mode 100644 lib/regexp-fields.h >> create mode 100755 test/T630-regexp-query.sh >> >> diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst >> index de93d733..d8527e18 100644 >> --- a/doc/man7/notmuch-search-terms.rst >> +++ b/doc/man7/notmuch-search-terms.rst >> @@ -34,10 +34,14 @@ indicate user-supplied values): >> >> - from:<name-or-address> >> >> +- from:/<regex>/ >> + >> - to:<name-or-address> >> >> - subject:<word-or-quoted-phrase> >> >> +- subject:/<regex>/ >> + >> - attachment:<word> >> >> - mimetype:<word> >> @@ -71,6 +75,17 @@ subject of an email. Searching for a phrase in the subject is supported >> by including quotation marks around the phrase, immediately following >> **subject:**. >> >> +The **from:** and **subject** prefix can be also used to restrict the >> +results to those whose from/subject value matches a regular >> +expression (see **regex(7)**) delimited with //. >> + >> +:: >> + >> + notmuch search 'from:/bob@.*[.]example[.]com/' >> + >> +Regular expression searches are only available if notmuch is built >> +with **Xapian Field Processors** (see below). > > And the poor user stopped reading far before this line, desperately trying > the regex searches... >;/ so IMO this requirement should be notified earlier. > >> + >> The **attachment:** prefix can be used to search for specific filenames >> (or extensions) of attachments to email messages. >> >> @@ -220,13 +235,18 @@ Boolean and Probabilistic Prefixes >> ---------------------------------- >> >> Xapian (and hence notmuch) prefixes are either **boolean**, supporting >> -exact matches like "tag:inbox" or **probabilistic**, supporting a more flexible **term** based searching. The prefixes currently supported by notmuch are as follows. >> - >> +exact matches like "tag:inbox" or **probabilistic**, supporting a more >> +flexible **term** based searching. Certain **special** prefixes are >> +processed by notmuch in a way not stricly fitting either of Xapian's >> +built in styles. The prefixes currently supported by notmuch are as >> +follows. >> >> Boolean >> **tag:**, **id:**, **thread:**, **folder:**, **path:**, **property:** >> Probabilistic >> - **from:**, **to:**, **subject:**, **attachment:**, **mimetype:** >> + **to:**, **attachment:**, **mimetype:** >> +Special >> + **from:**, **query:**, **subject:** >> >> Terms and phrases >> ----------------- >> @@ -396,6 +416,7 @@ Currently the following features require field processor support: >> >> - non-range date queries, e.g. "date:today" >> - named queries e.g. "query:my_special_query" >> +- regular expression searches, e.g. "subject:/^\\[SPAM\\]/" >> >> SEE ALSO >> ======== >> diff --git a/lib/Makefile.local b/lib/Makefile.local >> index b77e5780..ff812b5f 100644 >> --- a/lib/Makefile.local >> +++ b/lib/Makefile.local >> @@ -52,6 +52,7 @@ libnotmuch_cxx_srcs = \ >> $(dir)/query.cc \ >> $(dir)/query-fp.cc \ >> $(dir)/config.cc \ >> + $(dir)/regexp-fields.cc \ > > Space instead of TAB above -- tab is used more often (and \:s usually aligned) > >> $(dir)/thread.cc >> >> libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o) >> diff --git a/lib/database-private.h b/lib/database-private.h >> index ccc1e9a1..9f5659a9 100644 >> --- a/lib/database-private.h >> +++ b/lib/database-private.h >> @@ -190,6 +190,8 @@ struct _notmuch_database { >> #if HAVE_XAPIAN_FIELD_PROCESSOR >> Xapian::FieldProcessor *date_field_processor; >> Xapian::FieldProcessor *query_field_processor; >> + Xapian::FieldProcessor *from_field_processor; >> + Xapian::FieldProcessor *subject_field_processor; >> #endif >> Xapian::ValueRangeProcessor *last_mod_range_processor; >> }; >> diff --git a/lib/database.cc b/lib/database.cc >> index 2d19f20c..8a9ad251 100644 >> --- a/lib/database.cc >> +++ b/lib/database.cc >> @@ -21,6 +21,7 @@ >> #include "database-private.h" >> #include "parse-time-vrp.h" >> #include "query-fp.h" >> +#include "regexp-fields.h" >> #include "string-util.h" >> >> #include <iostream> >> @@ -272,12 +273,16 @@ static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = { >> { "folder", "XFOLDER:" }, >> }; >> >> -static prefix_t PROBABILISTIC_PREFIX[]= { >> +static prefix_t REGEX_PREFIX[]= { >> { "from", "XFROM" }, >> + { "subject", "XSUBJECT"}, >> +}; >> + >> +static prefix_t PROBABILISTIC_PREFIX[]= { >> + > > empty line ^ > >> { "to", "XTO" }, >> { "attachment", "XATTACHMENT" }, >> { "mimetype", "XMIMETYPE"}, >> - { "subject", "XSUBJECT"}, >> }; >> >> const char * >> @@ -295,6 +300,11 @@ _find_prefix (const char *name) >> return BOOLEAN_PREFIX_EXTERNAL[i].prefix; >> } >> >> + for (i = 0; i < ARRAY_SIZE (REGEX_PREFIX); i++) { >> + if (strcmp (name, REGEX_PREFIX[i].name) == 0) >> + return REGEX_PREFIX[i].prefix; >> + } >> + >> for (i = 0; i < ARRAY_SIZE (PROBABILISTIC_PREFIX); i++) { >> if (strcmp (name, PROBABILISTIC_PREFIX[i].name) == 0) >> return PROBABILISTIC_PREFIX[i].prefix; >> @@ -1042,6 +1052,10 @@ notmuch_database_open_verbose (const char *path, >> notmuch->query_parser->add_boolean_prefix("date", notmuch->date_field_processor); >> notmuch->query_field_processor = new QueryFieldProcessor (*notmuch->query_parser, notmuch); >> notmuch->query_parser->add_boolean_prefix("query", notmuch->query_field_processor); >> + notmuch->from_field_processor = new RegexpFieldProcessor ("from", *notmuch->query_parser, notmuch); >> + notmuch->subject_field_processor = new RegexpFieldProcessor ("subject", *notmuch->query_parser, notmuch); >> + notmuch->query_parser->add_boolean_prefix("from", notmuch->from_field_processor); >> + notmuch->query_parser->add_boolean_prefix("subject", notmuch->subject_field_processor); >> #endif >> notmuch->last_mod_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_LAST_MOD, "lastmod:"); >> >> @@ -1058,7 +1072,12 @@ notmuch_database_open_verbose (const char *path, >> notmuch->query_parser->add_boolean_prefix (prefix->name, >> prefix->prefix); >> } >> - >> +#if !HAVE_XAPIAN_FIELD_PROCESSOR >> + for (i = 0; i < ARRAY_SIZE (REGEX_PREFIX); i++) { >> + prefix_t *prefix = ®EX_PREFIX[i]; >> + notmuch->query_parser->add_prefix (prefix->name, prefix->prefix); >> + } >> +#endif >> for (i = 0; i < ARRAY_SIZE (PROBABILISTIC_PREFIX); i++) { >> prefix_t *prefix = &PROBABILISTIC_PREFIX[i]; >> notmuch->query_parser->add_prefix (prefix->name, prefix->prefix); >> @@ -1138,6 +1157,10 @@ notmuch_database_close (notmuch_database_t *notmuch) >> notmuch->date_field_processor = NULL; >> delete notmuch->query_field_processor; >> notmuch->query_field_processor = NULL; >> + delete notmuch->from_field_processor; >> + notmuch->from_field_processor = NULL; >> + delete notmuch->subject_field_processor; >> + notmuch->subject_field_processor = NULL; >> #endif >> >> return status; >> diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc >> new file mode 100644 >> index 00000000..8cb1cada >> --- /dev/null >> +++ b/lib/regexp-fields.cc >> @@ -0,0 +1,142 @@ >> +/* regexp-fields.cc - field processor glue for regex supporting fields >> + * >> + * This file is part of notmuch. >> + * >> + * Copyright © 2015 Austin Clements >> + * Copyright © 2016 David Bremner >> + * >> + * This program is free software: you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License as published by >> + * the Free Software Foundation, either version 3 of the License, or >> + * (at your option) any later version. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + * >> + * You should have received a copy of the GNU General Public License >> + * along with this program. If not, see https://www.gnu.org/licenses/ . >> + * >> + * Author: Austin Clements <aclements@csail.mit.edu> >> + * David Bremner <david@tethera.net> >> + */ >> + >> +#include "regexp-fields.h" >> +#include "notmuch-private.h" >> +#include "database-private.h" >> +#include <stdio.h> >> + >> +#if HAVE_XAPIAN_FIELD_PROCESSOR >> +static void >> +compile_regex (regex_t ®exp, const char *str) >> +{ >> + int err = regcomp (®exp, str, REG_EXTENDED | REG_NOSUB); >> + >> + if (err != 0) { >> + size_t len = regerror (err, ®exp, NULL, 0); >> + char *buffer = new char[len]; >> + std::string msg; >> + (void) regerror (err, ®exp, buffer, len); >> + msg.assign (buffer, len); >> + delete buffer; >> + >> + throw Xapian::QueryParserError (msg); >> + > > empty line ^ > >> + } >> +} >> + >> +RegexpPostingSource::RegexpPostingSource (Xapian::valueno slot, const std::string ®exp) >> + : slot_ (slot) >> +{ >> + > > ditto > >> + compile_regex (regexp_, regexp.c_str ()); >> +} >> + >> +RegexpPostingSource::~RegexpPostingSource () >> +{ >> + regfree (®exp_); >> +} >> + >> +void >> +RegexpPostingSource::init (const Xapian::Database &db) >> +{ >> + db_ = db; >> + it_ = db_.valuestream_begin (slot_); >> + end_ = db.valuestream_end (slot_); >> + started_ = false; >> +} >> + >> +Xapian::doccount >> +RegexpPostingSource::get_termfreq_min () const >> +{ >> + return 0; >> +} >> + >> +Xapian::doccount >> +RegexpPostingSource::get_termfreq_est () const >> +{ >> + return get_termfreq_max () / 2; >> +} >> + >> +Xapian::doccount >> +RegexpPostingSource::get_termfreq_max () const >> +{ >> + return db_.get_value_freq (slot_); >> +} >> + >> +Xapian::docid >> +RegexpPostingSource::get_docid () const >> +{ >> + return it_.get_docid (); >> +} >> + >> +bool >> +RegexpPostingSource::at_end () const >> +{ >> + return it_ == end_; >> +} >> + >> +void >> +RegexpPostingSource::next (unused (double min_wt)) >> +{ >> + if (started_ && ! at_end ()) >> + ++it_; >> + started_ = true; >> + >> + for (; ! at_end (); ++it_) { >> + std::string value = *it_; >> + if (regexec (®exp_, value.c_str (), 0, NULL, 0) == 0) >> + break; >> + } >> +} >> + >> +static inline Xapian::valueno _find_slot (std::string prefix) >> +{ >> + if (prefix == "from") >> + return NOTMUCH_VALUE_FROM; >> + else if (prefix == "subject") >> + return NOTMUCH_VALUE_SUBJECT; >> + else >> + throw Xapian::QueryParserError ("unsupported regexp field '" + prefix + "'"); >> +} >> + >> +RegexpFieldProcessor::RegexpFieldProcessor (std::string prefix, Xapian::QueryParser &parser_, notmuch_database_t *notmuch_) >> + : slot(_find_slot (prefix)), term_prefix(_find_prefix (prefix.c_str ())), parser(parser_), notmuch(notmuch_) >> +{ >> +}; >> + >> +Xapian::Query >> +RegexpFieldProcessor::operator() (const std::string & str) >> +{ >> + if (str.at (0) == '/' && str.at (str.size () - 1)){ >> + RegexpPostingSource *postings = new RegexpPostingSource (slot, str.substr(1,str.size () - 2)); >> + return Xapian::Query (postings->release ()); >> + } else { >> + /* TODO replace this with a nicer API level triggering of >> + * phrase parsing, when possible */ >> + std::string quoted='"' + str + '"'; >> + return parser.parse_query (quoted, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix); >> + } >> +} >> +#endif >> diff --git a/lib/regexp-fields.h b/lib/regexp-fields.h >> new file mode 100644 >> index 00000000..bac11999 >> --- /dev/null >> +++ b/lib/regexp-fields.h >> @@ -0,0 +1,77 @@ >> +/* regex-fields.h - xapian glue for semi-bruteforce regexp search >> + * >> + * This file is part of notmuch. >> + * >> + * Copyright © 2015 Austin Clements >> + * Copyright © 2016 David Bremner >> + * >> + * This program is free software: you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License as published by >> + * the Free Software Foundation, either version 3 of the License, or >> + * (at your option) any later version. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + * >> + * You should have received a copy of the GNU General Public License >> + * along with this program. If not, see https://www.gnu.org/licenses/ . >> + * >> + * Author: Austin Clements <aclements@csail.mit.edu> >> + * David Bremner <david@tethera.net> >> + */ >> + >> +#ifndef NOTMUCH_REGEXP_FIELDS_H >> +#define NOTMUCH_REGEXP_FIELDS_H >> +#if HAVE_XAPIAN_FIELD_PROCESSOR >> +#include <sys/types.h> >> +#include <regex.h> >> +#include "database-private.h" >> +#include "notmuch-private.h" >> + >> +/* A posting source that returns documents where a value matches a >> + * regexp. >> + */ >> +class RegexpPostingSource : public Xapian::PostingSource >> +{ >> + protected: >> + const Xapian::valueno slot_; >> + regex_t regexp_; >> + Xapian::Database db_; >> + bool started_; >> + Xapian::ValueIterator it_, end_; >> + >> +/* No copying */ >> + RegexpPostingSource (const RegexpPostingSource &); >> + RegexpPostingSource &operator= (const RegexpPostingSource &); >> + >> + public: >> + RegexpPostingSource (Xapian::valueno slot, const std::string ®exp); >> + ~RegexpPostingSource (); >> + void init (const Xapian::Database &db); >> + Xapian::doccount get_termfreq_min () const; >> + Xapian::doccount get_termfreq_est () const; >> + Xapian::doccount get_termfreq_max () const; >> + Xapian::docid get_docid () const; >> + bool at_end () const; >> + void next (unused (double min_wt)); >> +}; >> + >> + >> +class RegexpFieldProcessor : public Xapian::FieldProcessor { >> + protected: >> + Xapian::valueno slot; >> + std::string term_prefix; >> + Xapian::QueryParser &parser; >> + notmuch_database_t *notmuch; >> + >> + public: >> + RegexpFieldProcessor (std::string prefix, Xapian::QueryParser &parser_, notmuch_database_t *notmuch_); >> + >> + ~RegexpFieldProcessor () { }; >> + >> + Xapian::Query operator()(const std::string & str); >> +}; >> +#endif >> +#endif /* NOTMUCH_REGEXP_FIELDS_H */ >> diff --git a/test/T630-regexp-query.sh b/test/T630-regexp-query.sh >> new file mode 100755 >> index 00000000..722af715 >> --- /dev/null >> +++ b/test/T630-regexp-query.sh >> @@ -0,0 +1,82 @@ >> +#!/usr/bin/env bash >> +test_description='regular expression searches' >> +. ./test-lib.sh || exit 1 >> + >> +add_email_corpus >> + >> + >> +if [ $NOTMUCH_HAVE_XAPIAN_FIELD_PROCESSOR -eq 1 ]; then >> + >> + notmuch search --output=messages from:cworth > cworth.msg-ids >> + >> + test_begin_subtest "regexp from search, case sensitive" >> + notmuch search --output=messages from:/carl/ > OUTPUT >> + test_expect_equal_file /dev/null OUTPUT >> + >> + test_begin_subtest "empty regexp or query" >> + notmuch search --output=messages from:/carl/ or from:/cworth/ > OUTPUT >> + test_expect_equal_file cworth.msg-ids OUTPUT >> + >> + test_begin_subtest "non-empty regexp and query" >> + notmuch search from:/cworth@cworth.org/ and subject:patch > OUTPUT >> + cat <<EOF > EXPECTED >> +thread:0000000000000008 2009-11-18 [1/2] Carl Worth| Alex Botero-Lowry; [notmuch] [PATCH] Error out if no query is supplied to search instead of going into an infinite loop (attachment inbox unread) >> +thread:0000000000000007 2009-11-18 [1/2] Carl Worth| Ingmar Vanhassel; [notmuch] [PATCH] Typsos (inbox unread) >> +thread:0000000000000018 2009-11-18 [1/2] Carl Worth| Jan Janak; [notmuch] [PATCH] Older versions of install do not support -C. (inbox unread) >> +thread:0000000000000017 2009-11-18 [1/2] Carl Worth| Keith Packard; [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and unread) tags (inbox unread) >> +thread:0000000000000014 2009-11-18 [2/5] Carl Worth| Mikhail Gusarov, Keith Packard; [notmuch] [PATCH 1/2] Close message file after parsing message headers (inbox unread) >> +thread:0000000000000001 2009-11-18 [1/1] Stewart Smith; [notmuch] [PATCH] Fix linking with gcc to use g++ to link in C++ libs. (inbox unread) >> +EOF >> + test_expect_equal_file EXPECTED OUTPUT >> + >> + test_begin_subtest "regexp from search, duplicate term search" >> + notmuch search --output=messages from:/cworth/ > OUTPUT >> + test_expect_equal_file cworth.msg-ids OUTPUT >> + >> + test_begin_subtest "long enough regexp matches only desired senders" >> + notmuch search --output=messages 'from:"/C.* Wo/"' > OUTPUT >> + test_expect_equal_file cworth.msg-ids OUTPUT >> + >> + test_begin_subtest "shorter regexp matches one more sender" >> + notmuch search --output=messages 'from:"/C.* W/"' > OUTPUT >> + (echo id:1258544095-16616-1-git-send-email-chris@chris-wilson.co.uk ; cat cworth.msg-ids) > EXPECTED > > The above doesn't need to be executed in subshell: > > { echo id:1258544095-16616-1-git-send-email-chris@chris-wilson.co.uk; cat cworth.msg-ids; } > EXPECTED > > does it in the same shell > > >> + test_expect_equal_file EXPECTED OUTPUT >> + >> + test_begin_subtest "regexp subject search, non-ASCII" >> + notmuch search --output=messages subject:/accentué/ > OUTPUT >> + echo id:877h1wv7mg.fsf@inf-8657.int-evry.fr > EXPECTED >> + test_expect_equal_file EXPECTED OUTPUT >> + >> + test_begin_subtest "regexp subject search, punctuation" >> + notmuch search subject:/\'X\'/ > OUTPUT >> + cat <<EOF > EXPECTED >> +thread:0000000000000017 2009-11-18 [2/2] Keith Packard, Carl Worth; [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and unread) tags (inbox unread) >> +EOF >> + test_expect_equal_file EXPECTED OUTPUT >> + >> + test_begin_subtest "regexp subject search, no punctuation" >> + notmuch search subject:/X/ > OUTPUT >> + cat <<EOF > EXPECTED >> +thread:0000000000000017 2009-11-18 [2/2] Keith Packard, Carl Worth; [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and unread) tags (inbox unread) >> +thread:000000000000000f 2009-11-18 [4/4] Jjgod Jiang, Alexander Botero-Lowry; [notmuch] Mac OS X/Darwin compatibility issues (inbox unread) >> +EOF >> + test_expect_equal_file EXPECTED OUTPUT >> + >> + test_begin_subtest "combine regexp from and subject" >> + notmuch search subject:/-C/ and from:/.an.k/ > OUTPUT >> + cat <<EOF > EXPECTED >> +thread:0000000000000018 2009-11-17 [1/2] Jan Janak| Carl Worth; [notmuch] [PATCH] Older versions of install do not support -C. (inbox unread) >> +EOF >> + test_expect_equal_file EXPECTED OUTPUT >> + >> + test_begin_subtest "regexp error reporting" >> + notmuch search 'from:/unbalanced[/' 1>OUTPUT 2>&1 >> + cat <<EOF > EXPECTED >> +notmuch search: A Xapian exception occurred >> +A Xapian exception occurred performing query: Invalid regular expression >> +Query string was: from:/unbalanced[/ >> +EOF >> + test_expect_equal_file EXPECTED OUTPUT >> +fi >> + >> +test_done >> -- >> 2.11.0 > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > https://notmuchmail.org/mailman/listinfo/notmuch