Peter Münster <pm@a16n.net> writes: > On Thu, Mar 20 2025, David Bremner wrote: > >> not getting enough matches with a word search? > > Yes, indeed. > > >> Can you give me an example of the kind of search you are trying to do? > > I would like to find all messages with the substring "identité": > - identité > - identités > - l'identité > - l’identité > - d'identité > - d’identité Thanks for explaining your use case. I previously thought that regex search would not be helpful on single terms, but I can see it would be a workaround for notmuch's inadequate unilingual stemming (which is a harder problem to fix). The follow source change seems to enable it at least for s-expression queries: diff --git a/lib/parse-sexp.cc b/lib/parse-sexp.cc index 930888e9..7ce218fe 100644 --- a/lib/parse-sexp.cc +++ b/lib/parse-sexp.cc @@ -85,7 +85,7 @@ static _sexp_prefix_t prefixes[] = { "attachment", Xapian::Query::OP_AND, SEXP_INITIAL_MATCH_ALL, SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD | SEXP_FLAG_EXPAND }, { "body", Xapian::Query::OP_AND, SEXP_INITIAL_MATCH_ALL, - SEXP_FLAG_FIELD }, + SEXP_FLAG_FIELD | SEXP_FLAG_REGEX}, { "date", Xapian::Query::OP_INVALID, SEXP_INITIAL_MATCH_ALL, SEXP_FLAG_RANGE }, { "from", Xapian::Query::OP_AND, SEXP_INITIAL_MATCH_ALL, The test suite and documentation would need to be adjusted, but I think we could probably support that in the next major release of notmuch (0.40). If you are comfortable building from source you can of course just make the change in your build of notmuch. With that change your query could be done as NOTMUCH_DEBUG_QUERY=t ./notmuch count --query=sexp '(body (rx identité))' It does take about 5 seconds to run on this fairly fast computer, with my ~800k messages. Emacs integration would be a seperate question, and would probably require a hard build dependency on the sfsexp library, but that is a discussion already started. In principle a similar change should work for the Xapian (infix) query parser, but unfortunately there is some complications that I didn't manage to (quickly) debug. So I don't know if we can support the infix syntax or not. I don't think that's a blocker, as there are already several kinds of search that are only supported in the s-expression query syntax. > And, less important, it would be nice (it fails with mu) to search in > html-only messages. Example: > > "/v.*hicule/" should match "véhicule" > This won't work in notmuch either, because "véhicule" is indexed as two or three terms (words). _______________________________________________ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-leave@notmuchmail.org