Re: Query emails sent to undisclosed-recipients

Subject: Re: Query emails sent to undisclosed-recipients

Date: Thu, 15 Apr 2021 12:46:02 +1000

To: Tomi Ollila, David Bremner, Firmin Martin, notmuch@notmuchmail.org

Cc:

From: NeilBrown


On Tue, Mar 23 2021, Tomi Ollila wrote:

> On Tue, Mar 23 2021, David Bremner wrote:
>
>> Tomi Ollila <tomi.ollila@iki.fi> writes:
>>
>>> On Tue, Mar 23 2021, Firmin Martin wrote:
>>>
>>>> Hi,
>>>>
>>>> I have emails whose the "To" field is undisclosed recipients. In JSON:
>>>>
>>>> ```
>>>> "To": "undisclosed-recipients: ;"
>>>> ```
>>>>
>>>> I would want to tag such email as spam, but I can't query them
>>>> using 
>>>>
>>>> ```
>>>>  notmuch show --format=json to:"undisclosed-recipients: ;"
>>>> ```
>>>>
>>>> or any variation (regex etc.).
>>>>
>>>> This question has already been addressed in 2013 [1]. Are there any plan
>>>> to implement this feature or available workaround ?
>>>
>>> Tried. many things. did not work. notmuch-search-terms(7) tells
>>>
>>>      to:<name-or-address>
>>>
>>> (so no regex syntax...)
>>>
>>> I don't know why that doesn't work. IIRC no plan, but patches welcome >;D
>>
>> The (light) technical background is that regex syntax in notmuch
>> requires value slots, and someone (TM) would need to evaluate how much
>> adding a value slot for to: would cost in terms of database size / speed
>> of queries.
>>
>> I think there's a separate question about address groups being ignored,
>> discussed in the linked thread.
>
> But the question if why doesn't to:undisclosed-recipients:
> or to:undisclosed-recipients work

Because "undisclosed-recipient:" is not an address or a comment (in
RFC822 / RFC5322 syntax).  It is a label (a name for a group of addresses).
It is not syntactically valid to have an empty "to:" field, or to have
no "to:" field.  The only valid syntax which doesn't actually give any
address is "label:;".

These messages don't actually have any "to" address.
So
   notmuch search "not to:*"
should work... except that it doesn't.

    notmuch search --output=files "not (to:a* OR to:b* OR to:c* OR to:d* \
    OR to:e* OR to:f* OR to:g* OR to:h* OR  to:i* OR to:j* OR to:k* \
    OR \to:l* OR to:m* OR to:n* OR to:o* OR to:p* OR to:q* OR to:r* \
    OR to:s* OR to:t* OR to:u* OR to:v* OR to:w* OR to:x* OR to:y* OR to:z*)"

does work (as long as no addressed start with a non-alpha character).

I piped the above in
    xargs grep -i '^to:' | grep -v -i ': *;'

Some of the matches had an empty 'to:' which is syntactically invalid.
Others had "<>" as the address.  I don't think this is legal, but I've
seen it used in Return-path: a lot.  RFC5322 doesn't mention it.
The rest was in the noise.

NeilBrown
signature.asc (application/pgp-signature)
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Thread: