v2 sexpr parser

Subject: v2 sexpr parser

Date: Sat, 17 Jul 2021 23:39:56 -0300

To: notmuch@notmuchmail.org


From: David Bremner

This is a substantially revised version of the series at [1]. As far
as I know, it now understands (a translation of) most of the queries
handled by the existing query parser. Some remaining limitations/issues

1) The new query parser is only hooked into the notmuch search
subcommand. It should be fairly rote to hook it into the other
relevant subcommands, but I want to wait until resolving (2) before

2) The command line option --query-syntax={sexp,xapian} is a bit
klunky. Also "xapian" should perhaps be renamed "infix" to match the
'infix' operator in the new parser.

3) There is no documentation. I think notmuch-search-terms(7) is too
long already, so there should probably be a separate manual page. I
don't want to write that until I'm sure we want the new parser.

4) There is still some uncertainty around utf8 handling in sfsexp.

5) I'm not too sure about the new API call
notmuch_query_create_sexpr. I guess a more idiomatic thing to do would
be to add a new function with an extra argument, and have the old
function call it.

6) The way that user defined headers are used in the new parser is a
bit different than the existing one. Instead of (List notmuch), you
currently have to write (header List notmuch). I don't know if that's
better or worse. It's a bit more typing, but it is maybe a bit clearer to read.
It would probably not be too hard to switch.

7) Trailing wildcards like "subject:foo*" are not implemented yet.

In [2] Hannu mentioned being unclear on the design goals of the
s-expression query parser, so let me try and articulate the main
design goals a bit better. I think the existing query parser is great
for making "easy things easy". But when things are not easy and/or the
user wants better diagnostics, it is nice to have an alternative. 

A) More consistent / predictable syntax.

The notmuch query parser adds several features to the Xapian query
parser. Mainly due for implementation reasons, this has resulted in a
somewhat quirky syntax, and often fairly painful escaping. Probably
the most egregious syntax quirk is that '*' (for all messages) cannot
be composed with other queries. In particular is should simplify and
make more reliable code like "notmuch-search-filter", which tries to
combine an existing query with some user specified filter.
With the new parser, this 15-20 lines can be replaced by

`(and (infix ,existing) (infix ,new))

B) Better error reporting.

Xapian's query parser is designed to be permissive and almost never
rejects a query string.  This is not always ideal, particularly with
debugging constructed queries.

C) Extensibility

The Xapian Query API has functionality that is not (yet) exposed via
the QueryParser. It turns out that some common feature requests are
easy to add [3]. For example, to match messages with a List-Id header,
you can use '(header List :any)'. 

[1]: id:20210714000239.804384-1-david@tethera.net
[2]: id:60f190f8.1c69fb81.7e7d2.40d1@mx.google.com
[3]: In fairness, they would probably be fairly easy to add to the
Xapian QueryParser as well. But then we'd need to depend on a
sufficiently recent version.

notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org