[notmuch] Initial tagging

Subject: [notmuch] Initial tagging

Date: Thu, 25 Feb 2010 16:25:16 -0500

To: notmuch@notmuchmail.org

Cc:

From: James Vasile


I'm slowly groping my way to using the notmuch emacs client as my
routine MUA.  As I coerce it into tagging and displaying the way I want,
the next big question was automatically tagging things and getting them
in to notmuch.

I'm curious as to what people are doing in this regard.

My solution involves cron running a sync_email script.  Sync_email does
the correct dance to make sure it only ever runs one instance at a time.
It also logs to syslog.  The script runs offlineimap, a mail_filter
script that sorts mail in maildirs (for wanderlust, the MUA I'm hoping
to leave behind), and then finally a shell script to do notmuch new and
initial tagging.

The tagging script uses the inbox tag to identify new mail, tags it
according to criteria, then removes the inbox tag from anything it found
a match for.  Uncategorized mail keeps the inbox tag so I can inspect it
later and make rules for it (or tag it manually).

Also, prepending "tag:inbox and" to search criteria restricts the
tagging to a small subset of the db, which makes the tagging script run
fairly quickly.  My unexpurgated tagging script has almost 100 rules for
tagging, and I expect it to grow over time.

################## notmuch-tag.sh ################
#!/bin/bash

bin=/usr/local/bin/notmuch

function notmuch {    
    echo $1
    while [ 1 -gt 0 ]; do
	result=`$bin $1 2>&1` 
	regex="already locked"

	if [[ $result =~ $regex ]]; then
	    echo "Xapian DB busy.  Retrying in 2 seconds"
	else
	    if [ -n "$result" ]; then
		echo $result
	    fi
	    return
	fi

	sleep 2
    done
}

function tag_new { notmuch "tag $1 tag:inbox and ($2)"; }
function blacklist { tag_new "-inbox -unread +delete" $1; }

notmuch new

blacklist "from:xxx@example.com or from yyy@example.com"

# voicemail
tag_new "-inbox +voicemail"  "from:ast@example.com"

# friends
tag_new "+friend +mathieu" "mathieu or ejm2106 or emily@example.com"
tag_new "+friend +balktick" "balktick"

# open community services
tag_new "+ocs" "open community services or opencommunityservices"
   
# okos
tag_new "+okos" "jim and glaser and not LinkedIn"
tag_new "+okos" "joshlevy.ny@example.com"
tag_new "+okos" "enright@example.com"

# book liberator
tag_new "+bklib" "wnf@example.com or bkrpr"

# joomla
tag_new "+osm" "from:waring or to:waring"
tag_new "+osm" "from:dave.huelsmann@example.com or to:dave.huelsmann@example.com"
tag_new "+osm" "james.vasile@example.com"
tag_new "+osm" "joomla"

#lists
tag_new "+list +notmuch" "to:notmuchmail.org or notmuch"
tag_new "+list +stumpwm" "to:stumpwm-devel@nongnu.org or stumpwm"
tag_new "+list +bklib" "to:bklib@googlegroups.com"
   
## Catchalls for sflc, hv, etc.
tag_new "+sflc" "not tag:list and not tag:friend and softwarefreedom.org and not tag:osm"
tag_new "+sflc" "to:firm@example.com"
tag_new "+hv" "hackervisions.org and not tag:list and not tag:friend"
tag_new "+gmail" "(to:jvasile@example.com or from:jvasile@example.com) and not tag:list and not tag:friend"

## Mark mine unread
tag_new "-unread" "from:james@example.com"
tag_new "-unread" "from:vasile@example.com"
tag_new "-unread" "from:james.vasile@example.com"

## Remove inbox tag
tag_new "-inbox" "tag:sflc or tag:hv or tag:list or tag:osm or tag:okos or tag:friend or tag:bklib"


############# sync_email #########################
#!/bin/sh

## Sync email unless we're already in the process of syncing.

SCRIPTNAME=`basename $0`
PIDDIR=/home/vasile/var/run/${SCRIPTNAME}
PIDFILE=${PIDDIR}/${SCRIPTNAME}.pid

## Do the double-lock with a dir and a pid file
if ! mkdir ${PIDDIR} 2>/dev/null; then

   sleep 3 # give the other process time to write its pid

   if [ -f ${PIDFILE} ]; then
      #verify if the process is actually still running under this pid
      OLDPID=`cat ${PIDFILE}`
      RESULT=`ps -ef | grep ${OLDPID} | grep ${SCRIPTNAME}`  

      if [ -n "${RESULT}" ]; then
        logger -s ${SCRIPTNAME} already running! Exiting
        exit 255
      fi
   fi
fi

## Update pid file
PID=`ps -ef | grep ${SCRIPTNAME} | head -n1 | awk ' {print $2;} '`
echo ${PID} > ${PIDFILE}

logger -s filter done, starting offlineimap
offlineimap -l /home/vasile/.offlineimap/log
logger -s offlineimap done, starting mail filter
mairix --unlock
/home/vasile/bin/mail_filter.py
logger -s mail filter done, starting notmuch tagger
/home/vasile/bin/notmuch-tag.sh > /home/vasile/var/log/notmuch
logger -s notmuch tagger done sync_email finished

## clean up pid file and dir
if [ -f ${PIDFILE} ]; then
    rm -rf ${PIDDIR}
fi

Thread: