Frederick Eaton <frederik@ofb.net> writes: > > Suppose the filter script reads a message from a particular file and decides that it is > spam. How does the filter tell Notmuch that the message corresponding to that file is spam? > You seem to be saying below that the filter script should extract the Message-ID and use it > to identify the message to Notmuch, since file paths of the messages are not > indexed. Probably what my script should be doing for each message is appending a line to a > batch file like this: > > +spam -new -- id:some_message_id@foo > +inbox -new -- id:some_other@baz > > and then passing the batch file to "notmuch tag"? > Hello Fredrick, you are exactly correct. This is what I've written to handle spam filtering in my notmuch post-new hook. Like you, I have notmuch configured to assign newly fetched mail with tag "new" notmuch search --output=messages 'tag:new' > /tmp/msgs notmuch search --output=files 'tag:new' |\ bogofilter -o0.7,0.7 -bt |\ paste - /tmp/msgs |\ awk '$1 ~ /S/ { print "-new +spam", "-", $3 }' |\ notmuch tag --batch This should run under any shell. My chosen filter is bogofilter. The -bt flags tell it to operate on a stdin "batch" of file paths and return a "terse" summary of results e.g. H 0.248913 S 0.999999 This script operates on the assumption that the order of results from notmuch queries are always the same, which is fortunately true. >>>I've tentatively concluded that the best way to locate each message in the Notmuch database >>>is to extract the Message-ID and search for it with "id:"? But the FAQ says that multiple >>>messages can have the same Message-ID (and some spam messages don't have one at all). Your instinct to use batch tagging and id: queries is correct. I collect my new message ids in /tmp/msgs. These ids are unique, they are definitely unique enough to be used to tag individual messages on a daily basis. If you prefer to tag entire threads as spam the moment a single message is spam, you can simply use notmuch search --output=threads 'tag:new' > /tmp/msgs I prefer to manually mute threads with a mute tag, but Thread ids are definitely unique. If you want auto-tag spam in an existing archive, then you will need to first manually tag a good quantity of messages (100-1000) you consider to be spam and a good quantity of messages (100-1000) you consider to be ham and use them to train the filter e.g. notmuch search --output=files 'tag:spam' | bogofilter -bs notmuch search --output=files 'tag:inbox' | bogofilter -bn >>>If I could access the message using the filename that the script is processing, it would >>>seem slightly more reliable. It seems like there should be some way to allow a Notmuch >>>database entry to be accessed directly by filename, without even creating a Notmuch-style >>>search query containing that filename, but rather by passing the filename as a command-line >>>argument to "notmuch". It would be nice not to have to worry about quoting and unquoting. >> >>I am not sure if this is useful, given that (presumably) Notmuch uses message IDs as >>keys. Besides, those filenames are usually generated automatically and quite cryptic. > > It might be useful for the reasons I stated, namely in case the Message-ID does not exist or > is not unique. I think mail that is successfully transmitted through a mail host necessarily obtains a message id, but I might be wrong. I believe notmuch indexes on both it's own unique thread ids and the message ids. Thereby further decreasing the already minuscule chance of message id collisions. -- Best, Panos _______________________________________________ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-leave@notmuchmail.org