Re: [notmuch] Git as notmuch object store (was: Potential problem using Git for mail)

Subject: Re: [notmuch] Git as notmuch object store (was: Potential problem using Git for mail)

Date: Mon, 25 Jan 2010 00:19:24 -0500 (EST)

To: martin f krafft

Cc: notmuch

From: Asheesh Laroia


On Mon, 25 Jan 2010, martin f krafft wrote:

> also sprach Asheesh Laroia <asheesh@asheesh.org> [2010.01.21.1928 
> +1300]:
>>> I suppose that I never actually considered merges on the IMAP server 
>>> side, but obviously the IMAP server has to work off a clone, and that 
>>> means it needs to merge.
>>
>> It's not "merge" that's unsafe; that just builds a tree in the git 
>> index (assuming no conflicts). It's the ensuing process of git writing 
>> a tree to the filesystem that is problematic.
>
> There is no way to make that atomic, I am afraid. As you say.
>
>> I could probably actually write a wrapper that locks the Maildir while 
>> git is operating. It would probably be specific to each IMAP server.
>
> Ouch! I'd really rather not go there.

You say "Ouch" but you should know Dovecot *already* does this. I don't 
mind interoperating with that.

See http://wiki.dovecot.org/MailboxFormat/Maildir, section "Issues with 
the specification", subsection "Locking". I term this the famous readdir() 
race. Without this lock, Maildir is fundamentally incompatible with IMAP 
-- one Maildir-using process modifying message flags could make a 
different Maildir-using process think said message is actually deleted. In 
the case of temporary disappearing mails in Mutt locally, that's not the 
end of the world. For IMAP, it will make the IMAP daemon (one of the 
Maildir-using processes) send a note to IMAP clients saying that the 
message has been deleted and expunged.

>> Note that this mean git is fundamentally incompatible with Maildir, not 
>> just IMAP servers.
>
> We had an idea about using Git to replace IMAP altogether, along with 
> making notmuch use a bare Git repository as object store. The idea is 
> that notmuch uses low-level Git commands to access the .git repository 
> (from which you can still checkout a tree tying the blobs into a 
> Maildir). The benefit would be compression, lower inode count (due to 
> packs), and backups using clones/merges.

Sure, that makes sense to me.

> You could either have the MDA write to a Git repo on the server side and 
> use git packs to download mail to a local clone, or one could have e.g. 
> offlineimap grow a Git storage backend. The interface to notmuch would 
> be the same.

Yeah, I generally like this.

> If we used this, all the rename and delete code would be refactored into 
> Git and could be removed from notmuch. In addition, notmuch could 
> actually use Git tree objects to represent the results of searches, and 
> you could checkout these trees. However, deleting messages from search 
> results would not have any effect on the message or its existence in 
> other search results, much like what happens with mairix nowadays.

That's okay with me.

> I think we all kinda agreed that the Maildir flags should not be used by 
> notmuch and that things like Sebastian's notmuchsync should be used if 
> people wanted flags represented in Maildir filenames.

Aww, I like Maildir flags, but if there's a sync tool, I'm fine with that.

> Instead of a Maildir checkout, notmuch could provide an interface to 
> browse the store contents in a way that could make it accessible to 
> mutt. The argument is that with 'notmuch {ls,cat,rm,…}', a mutt backend 
> could be trivially written. I am not sure about that, but it's worth a 
> try.

Sure.

> But there are still good reasons why you'd want to have IMAP capability 
> too, e.g. Webmail. Given the atomicity problems that come from Git, 
> maybe an IMAP server reading from the Git store would make sense.

It wouldn't be too hard to write a FUSE filesystem that presented an 
interface to a Git repository that didn't allow the contents of files to 
be modified. Then Dovecot could think it's interacting with the 
filesystem.

> However, this all sounds like a lot of NIH and reinvention. It's
> a bit like the marriage between the hypothetical Maildir2 and Git,
> which is definitely worth pursuing. Before we embark on any of this,
> however, we'd need to define the way in which Git stores mail.

Sure. If it were me, I'd just say, "For phase 1 of notmuch, just have git 
store Maildir spools." When you need a filesystem interface for e.g. 
Dovecot, have a FUSE wrapper.

See how far that can take you, and then see if version 2 is necessary. 
(-:

> Stewart, you've worked most on this so far. Would you like to share your 
> thoughts?

I'll listen, too.

Just don't fall into the trap of thinking Maildir is compatible with IMAP. 
It's not, because as I understand things, the filesystem doesn't guarantee 
that you can actually iterate across a directory's files if another 
process is modifying the list of files.

I'm not sure, but maybe it's safe if you refuse to ever modify a 
message's flags in the filename.

Anyway, as I see it, further hacks that aren't much worse than Dovecot's 
should be considered okay, unless you have a more elegant design up your 
sleeve.

If I'm slightly wrong about something, try to give me the benefit of 
doubt. It's past midnight. (-:

-- Asheesh.

-- 
There's no real need to do housework -- after four years it doesn't get
any worse.

Thread: