Re: Initial tagging all mail from mailing lists

Subject: Re: Initial tagging all mail from mailing lists

Date: Fri, 16 May 2025 06:59:27 +0900

To: Gregor Zattler, notmuch@notmuchmail.org

Cc:

From: David Bremner


Gregor Zattler <telegraph@gmx.net> writes:

> Hi David,
> * David Bremner <david@tethera.net> [2025-05-15; 15:51 +09]:
>> Gregor Zattler <telegraph@gmx.net> writes:
>>
>>>
>>> If you would like to tag all emails from
>>> all lists:
>>>
>>> notmuch tag +news -- 'List:*'
>>>
>>
>> I don't think that part will work correctly. As far as I know you need
>> to use a sexp query, as I mentioned in my other message.
>
> It works for me, except for emails which were indexed before I made
> this configuration.

I'm afraid it's just pretending to work. The default query parser very
rarely reports an error, so it's actually just searching for the word
"list", anywhere. It might approximate what you want. Compare:

    ╭─ motzkin:~
    ╰─% NOTMUCH_DEBUG_QUERY=t notmuch count --exclude=false  '(List *)'
    Query string is:
    (List *)
    Exclude query is:
    Query()
    Final query is:
    Query((Tmail AND (list@1 OR Glist@1 OR Klist@1 OR Klist@1 OR Qlist@1 OR Qlist@1 OR Plist@1 OR XPROPERTYlist@1 OR XFOLDER:list@1 OR XFROMlist@1 OR XTOlist@1 OR XATTACHMENTlist@1 OR XMIMETYPElist@1 OR XSUBJECTlist@1 OR XUList:list@1)))

The big "OR" is how we implement "find this word anywhere"

    ╭─ motzkin:~
    ╰─% NOTMUCH_DEBUG_QUERY=t notmuch count --exclude=false --query=sexp '(List *)'
    Query string is:
    (List *)
    Exclude query is:
    Query()
    Final query is:
    Query((Tmail AND (<alldocuments> AND WILDCARD SYNONYM XUList:)))
    555917

Here the sexp query parser is generating "WILDCARD SYNONYM XUList:" from
'(List *)'

Apparently I have a fair number of mailing list messages that don't
mention the word List in the content.

[snip...]

> I would like to reindex this one file but I do not manage to provide
> the correct format of the path: search term:

Notmuch doesn't index file names, only directory names (it stores the
file names so they can be printed, but not as document terms in the
database).

> $ notmuch search --output=files
> 'path:"/home/grfz/Mail/~ml/linux-l@mlists.in-berlin.de/cur/1060758271.3350_0.pit:2,"'

You don't need, or want the leading / here; the directories (folders)
are indexed relative to the mail root. The following should work, but
won't match the individual file.

     notmuch search --output=files 'path:"~ml/linux-l@mlists.in-berlin.de/cur/"'
>
> It is possible to reindex a single file:
>

>  notmuch reindex path:~ml/linux-l@mlists.in-berlin.de/cur/1060758271.3350_0.pit-to-be-deleted

I don't know precisely happens for you, but try 

    notmuch count path:~ml/linux-l@mlists.in-berlin.de/cur/1060758271.3350_0.pit-to-be-deleted

in place of reindex. I think you will see the answer is 0, which means
reindex won't do anything.

> But strangely I'm not able to search for this one file:

This does not surprise me, as explained above.

> Can you/someone help me with formating / escaping / quoting filenames
> in path: search terms?

Unfortunately there is no way to escape, as filenames are just not
supported in queries. They could be, but it would require a complete
redesign of the database schema. That might happen at some point, but it
could be a while.
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Thread: