RFC: notmuch powered (personal) (end-to-end) e-mail system

Subject: RFC: notmuch powered (personal) (end-to-end) e-mail system

Date: Sun, 20 Mar 2011 16:07:50 +0200

To: notmuch@notmuchmail.org

Cc:

From: Ciprian Dorin Craciun


    Hello all! (Sorry for the long email.)

    I'm "struggling" for some time to get rid of the current
"de-facto" email solutions (i.e. GMail, Zimbra), and I've passively
observed for some time the notmuch project and community.

    Although I've forwarded all my email to a single account, and I'm
currently mirroring my GMail account locally (by using `mbsync`),
index it by using notmuch, and I collect spam mails for later filter
training, unfortunately I'm unable to "convert" because the current
notmuch-powered solutions have (some of) the following shortcomings (I
don't want to offend anyone, so please take these as observations):
    * the most feature full UI is the Emacs one -- thus limited remote
access (I mean from an arbitrary computer with only a web-browser);
(and I'm not a very big fan of Emacs;)
    * most are still dependent on external IMAP systems -- this is not
a problem with notmuch itself, but for the integrating clients;
    * SPAM -- as above -- is not integrated;
    * filtering (tag applying) is not automatic (as in integrated in
notmuch itself or the client), but triggered through external scripts;

    As such I'm thinking on implementing a custom end-to-end email
system and I would like to hear your feedback before embarking on such
a task.

    I'm targeting the following features:
    * (inbound) SMTP integration, thus once an email is received it is
automatically pushed through the system; (I'm primarily targeting
those users that afford to run their own SMTP server; but the solution
could still be adapted for those that only want the other features;)
    * automatic spam filtering, and tag applying;
    * automatic email triggers based on tags (such as user
notifications, forwarding, etc.)
    * remote RPC-like access to the whole system;
    * remote Web user interface;

    About the overall architecture I'm thinking on adopting the following:
    * in general the whole system is decomposed in independent
components (long-lived OS daemons) that each one does a particular job
(see below);
    * all the components communicate between each-other through a
message queue system (for example ZeroMQ or RabbitMQ);
    * all the communication is JSON based;

    The components would be:
    * SMTP inbound gateway -- for example I could take qmail or
Postfix and replace the delivery agent with a custom process that
pushes the email into the system; (any other solution suggestions?);
    * email store -- as the name suggests it is a simple
key-value-like store that should persist raw email-messages; it should
be as robust as possible, and its contents should be the only thing
needed to reconstruct all the other derived data; (I could use here a
simple process that maintains a maildir, I could go also with a
BerkeleyDB wrapper, or even something more sophisticated;)
    * spam filter -- which either classifies the email or trains the
spam filter; (for example I would use bogofilter;)
    * email index -- this is where notmuch would come into play; it
would be fed with emails, which it would automatically apply tags and
issue trigger notifications based on tags; it also maintains a set of
filters and tags to automatically apply;
    * (maybe) a coordinator that should delegate and monitor requests
to the above components; but if I'm using RabbitMQ and carefully
designing the above components, they could drive each other;
    * restful web service that would intermediate access to all the
above components;

    For now I have the following uncertainties:
    * how should I handle multiple users? I think each user should
have it's own store / notmuch / bogofilter instance (at least in terms
of storage if not even in terms of separate daemon);
    * should I keep the emails is a file-system, or a key-value store?
(the file-system is more bug-free, but I'm confident that a BerkeleyDB
instance would be more efficient);
    * should I use libnotmuch or for starters just make a notmuch tool wrapper;
    * and the most pressing one, transactions: I would like that at no
point does a message get half processed or lost; as such I need
notmuch to behave transactionally -- indexing the message and tagging
it should be atomic and durable; (is there a way with libnotmuch to
control the underlaying BerkeleyDB database?)

    Suggestions? Considerations?

    Ciprian.

Thread: