Re: Distributed Notmuch

Subject: Re: Distributed Notmuch

Date: Tue, 10 Jan 2012 03:54:47 -0000

To: notmuch@notmuchmail.org

Cc:

From: Jan Pobrislo


Quoting Ethan Glasser-Camp (2012-01-08 11:23:59)
>Hi guys,
>
> ...
>
>In brainstorming about the One True Mail Setup, my friend suggested to 
>me that Maildir/IMAP are not really the best choices for mail storage. 

In my opinion Maildirs are very good mail storage format, the issue is
just that IMAP can't transfer them in their entirety and simplicity.

>Among other flaws: to synchronize mail via IMAP you have to check the 
>headers of each message, which means a lot of bandwidth;

There are UIDs in IMAP, see:
http://tools.ietf.org/html/rfc3501#section-2.3.1.1
But I do agree IMAP is indeed not a very good protocol.

>compress Maildir, meaning lots of wasted space;

There are several approaches to compressing the filesystem that can be
used with maildirs, but this could easilly become bottleneck for most
setups.

>My friend suggested that instead it might be better to dump mail into
>some kind of database, for example CouchDB, and synchronize it that way. 

Some time ago I pondered putting emails into MongoDB so the client does
not have to deal with parsing MIME, but this does not give you any big
advantage for synchronization. Rather, you'll be running into the
consistency/availability/partition-tolerance issue. You'll have to
choose if you want to support offline write operations and if so, how
will you handle conflicts that will appear. DVCSes are built to make
this as easy as possible, databases usually not. I cannot comment on
CouchDB and it's MVCC, but I still doubt it would be as practical as
true DVCS.

By the way I highly reccomend this blogpost series:
http://blog.mongodb.org/post/475279604/on-distributed-consistency-part-1

> ...
>
>So my question for the wizards on this list is what their idea of the 
>One True Mail Setup would be in a perfect, or slightly better, world, 
>and what needs to be done to get there. I know some people use one 
>notmuch install that they access remotely. For myself, I'm on a pretty 
>limited Internet connection, so low bandwidth/offline access are big for 
>me, and despite Nicolas Sebrecht and Sebastian Spaeth's heroic work on 
>OfflineIMAP, it still uses a lot of bandwidth to sync. And obviously the 
>whole point of this exercise is tag synchronization..

I tend to go offline too with my laptop, so I can see what are you
talking about. For me the Ideal Mail Setup would be:

* access via ssh to limited/pseudoshell account
  - ssh handles autentication far better than sasl-based apps
  - ssh is designed to allow multiple operations in parallel including
    large uploads/downloads that can be resumed
* maildir is accessible via sftp (sshfs) and ssh+rsync
* there is notmuch launchable from the restricted shell, every new mail
  is indexed
* there is database of messages, tags and filenames, kept under DVCS.
  With aid of this database full three-way merges may be performed.
* once client is connected, he should have a way to listen for change
  messages that the server will push

This would allow for convenient operation both in online (storage-free)
and offline (replicated) mode.

I think this is actually pretty implementable. I'd use twisted.conch for
ssh server (launchpad.net uses this), which can be easilly tied in with
dovecot's autentication daemon. Change detection can be done via
inotify/lsyncd. The versioning/merging tool can possibly be based off
current nmbug (I haven't examined it yet). But I'm pretty sure I won't
have time for project of such scale in near future.

</braindump>

Thread: