Re: [Patch v4] lib: regexp matching in 'subject' and 'from'

Subject: Re: [Patch v4] lib: regexp matching in 'subject' and 'from'

Date: Fri, 10 Feb 2017 08:29:05 +0000

To: David Bremner, Jani Nikula, Tomi Ollila, notmuch@notmuchmail.org

Cc:

From: Mark Walters


On Thu, 09 Feb 2017, David Bremner <david@tethera.net> wrote:
> Jani Nikula <jani@nikula.org> writes:
>
>>
>> Theoretically "/" is an acceptable character in message-ids [1]. Rare,
>> unlikely, but acceptable. Searching for message-id's beginning with "/"
>> would have to use regexps, which would break in all sorts of ways
>> throughout the stack. I don't think there are handy alternatives to
>> "/<regex>/", given the characters that are acceptable in message-ids,
>> but this is something to think about.
>
> Would telling the user to \ escape ( or double /) the initial / be good
> enough there? This would disable regex processing.  I guess this goes
> back to someone's earlier suggestion.  A third option would be to use
> single quotes there ("id:'/foo'"), but that isn't really consistent with either Xapian
> or usual regex conventions.
>
> So I guess my favourite idea ATM is to use id:\/some/crazy/message-id
> FWIW, I don't have any such message ids.
>
>> For example, could the regexp matcher for message-ids first check if the
>> "regexp" is a strict match with "/" and all, and accept those? This
>> might be a reasonable workaround if it can be made to work.
>
> We're building a query, so I think the equivalent is to make an OR, with
> the exact match and the regex posting source. That could be done,
> although I'm a bit uneasy about how this makes the syntax for id:
> different, so id:/foo would be legit, but from:/foo would be an error.
> Maybe the dwim-factor is worth it.

Hi

Broadly I like the backslash escaping option. Two thoughts: can any
fields (from/subject/message-id) start with a "\" anyway? I think not
but thought it worth checking.

Secondly, message-id is often round-tripped, that is output from notmuch
and then fed back to notmuch. Do we want to escape the output as above
before printing in any cases? My view is that if we output the
message-id prefixed with "id:" then we should escape it (which applies
with --output=messages --format=text), but if we don't print the "id:"
part then we shouldn't (eg with --format=json). A similar thing would
apply to emacs: if it is a normal stash then escape the id, but if it is
a "bare stash" then do not.

Actually, one more thing: it would be a shame to block or significantly
delay the series for such a corner case.

Best wishes

Mark




Thread: