notspam: a notmuch interface to spamassassin

Subject: notspam: a notmuch interface to spamassassin

Date: Tue, 05 Mar 2013 22:43:12 -0800

To: Notmuch Mail

Cc:

From: Jameson Graef Rollins


Hey, folks.  I put together a little python program as an interface
between notmuch and spamassassin (sa) that I thought others might be
interested in:

git://finestructure.net/notspam

It's only dependencies are a running local sa daemon and python-notmuch.
It's pretty straightforward: it's just a single python script that has
two main functions 'learn' and 'tag'.  'Learn' takes a notmuch search
and pipes the resulting messages into sa (via sa-learn) to be classified
as ham or spam.  'Tag' takes a notmuch search and passes the resulting
messages through the sa classifier (via spamc) to be tagged as ham or
spam.

Here's how I've been using it:

 * Tag spam manually with the tag 'spam'.  It's good to have done this
   for a while to build up a good amount of manual classification.

 * Once you've got some manual classification, teach sa:

   notspam learn spam tag:spam
   notspam learn ham not tag:spam

   Everything after the meat ('spam'/'ham') are the notmuch search
   terms.  Rerun this periodically to update, but you might want to
   restrict the search a little so sa-learn doesn't eat a lot of
   overhead reprocessing old messages that haven't changed
   classification.

 * Call 'notspam tag' in your post-new hook (all my new messages are
   tagged 'new' initially):

   notspam tag --spam=spamd tag:new

   I give the sa-classified mail a different tag so it's easy to
   distinguish what was classified by me and what was classified by sa.

Pretty simple.  See 'notspam help' for more info.

Right now it's geared specifically for sa, but it would be easy to
expand it to handle arbitrary learn/classify commands.  If there's any
further interest in this, I would be happy to help push on it more.

jamie.

PS: if anyone has any suggestions for Bayesian classifiers better than
sa I'm all ears.  I'm not so happy with sa at the moment.  It misses a
lot more spam than I would like.  Maybe I just haven't tweaked it out
yet, in which case if anyone has any suggestions on how to improve sa's
classification I'm also all ears.
part-000.sig (application/pgp-signature)

Thread: