[RFC PATCH 00/13] Modular message store code

Subject: [RFC PATCH 00/13] Modular message store code

Date: Wed, 15 Feb 2012 17:01:53 -0500

To: notmuch@notmuchmail.org

Cc:

From: Ethan Glasser-Camp


Hi guys,

I'm submitting as RFC this patch series, which introduces the idea of a "mailstore", a "class" that defines how to access mail, instead of currently assuming it's always some Maildir-ish hierarchy that contains a bunch of mail.

This was listed as a wishlist item on http://notmuchmail.org/feature-requests/, so I went ahead and took a crack at it, learning a lot about the codebase as I did. I'm sure there are tons of stylistic concerns so I'd like to get as much feedback as possible, starting of course with "Does a feature like this have a chance of ever making it in" and followed by "Am I on the right track".

Note that this series breaks the language bundings; the Python bindings have very minimal tests so I very minimally fixed them (probably still broken in other ways), but the Ruby and Go bindings are probably super broken. Note also that the one message = one file approach is pretty thoroughly embedded into Notmuch and there are lots of places (again such as the Python bindings) where this has yet to be rooted out.

They say an interface isn't trustworthy until you've implemented it three times, so while most of the patches define the interface, the last patch adds support for an experimental CouchDB backend. It's got at least one known bug (it indexes everything, whether or not it's a mail object), sometimes it hangs when trying to access a message, and it's definitely leaking memory in notmuch-new. I haven't done strict timing comparisons but it seems like notmuch-search and notmuch-show are approximately as fast as with maildir and notmuch-new takes maybe 25% longer. Nevertheless, it is included as a demonstration that the interface is at least plausible.

The implementation of "mailstores" follows these principles:

- Messages still have "filenames", but the mailstore gets to define its own semantics for these filenames (document ids, file + byte offset..). _notmuch_message_ensure_filename_list converts all filenames coming out of the DB to be absolute paths centered at the user's database path, which is inconvenient for Couchdb, but workable.

- The user keeps all mail in one mailstore. The alternative, which is that each message can be in a different mailstore, seemed like a lot more work. "One mailstore" makes sense when it's cases like maildir vs. couchdb, but if we decide to introduce a "mbox" backend -- directory tree with mbox files -- then it might "overlap" with the maildir mailstore, and then who knows?

Patch 1 introduces the configuration parameter database.type, which will be used to select a mailstore type.

Patch 2 introduces the most important API for a mailstore, notmuch_mailstore_open, and makes it required when creating a message_file. Patch 3 fixes the Python breakage this creates.

Patch 4-6 replace the other places where files are opened directly with calls to notmuch_mailstore_open.

Patch 7-8 prepare notmuch-new to be more generic. I couldn't find an elegant way to combine add_files logic with other mailstores, so I just decided each mailstore should have its own add_files function.

Patches 9-11 add other functions to the mailstore API -- to rename files, to close files, and to "construct" a mailstore. Patch 12 uses the "close" API to close files (where necessary).

Patch 13 proposes the CouchDB mailstore as one block of code.

Points to address:

- Where to put the new notmuch_mailstore_t* parameter in all these functions? I applied a "decreasing order of importance" heuristic, but it's a little weird in places like notmuch_database_open and notmuch_database_create.

- Is there a better, more elegant way to pass around mailstore objects without adding parameters to each function? Additionally, should I be using some other class-like mechanism for mailstores instead of hacks involving structs?

- Should this be configured under [database] or perhaps under some other new heading?

- How strict is the rule that braces shouldn't be there if the body of a loop/conditional is only one line? This feels really strange to me coming from Python.

- If I'm already touching code, should I add other drive-by fixes, as in patch 05, or should I resolutely refuse to change anything, as in patch 07?

- Should something like the CouchDB backend be optional, and if so, what mechanisms do I need to use to make that happen?

Thanks so much for your time!
Ethan


Thread: