Re: [RFC PATCH 00/14] modular mail stores based on URIs

Subject: Re: [RFC PATCH 00/14] modular mail stores based on URIs

Date: Thu, 28 Jun 2012 21:45:17 +0100

To: Ethan


From: Mark Walters

On Thu, 28 Jun 2012, Ethan <> wrote:
> I sent this at first as a reply-only-to-sender. Oops! Sorry Mark for the
> double send.
> On Wed, Jun 27, 2012 at 5:17 AM, Mark Walters <>wrote:
>> > Personally, this isn't my favorite approach, for the following reasons:
>> >
>> > 1. Notmuch, at some point in its history, chose to store file paths
>> > relative to a "mail database", with the intent that if this mail
>> > database was moved, filenames would not change and everything would
>> > Just Work (tm). The above scheme completely reverses this design
>> > decision, and in general completely breaks this relocatability. I
>> > don't see any easy way to handle this problem. This isn't just a
>> > wishlist feature; at least two things in the test suite (caching of
>> > corpus.mail, and the atomicity tests) rely on this behavior.
>> Why can't the URI just store a relative path, at least for maildir://
>> and mbox:// ? It is purely internal to notmuch so it doesn't need to be
>> very standard.
> Well, relative to where? This is especially relevant now that we can have
> multiple mail stores. It sounds like you are suggesting that all mbox://
> URIs are relative to an "mbox root", but the fundamental question is how to
> pass that information from the configuration into the library.

I was thinking of just having one mail root and inside that there could
be maildirs and mboxes. Everything would still be relative to the root.

> Even using configuration itself may be problematic, because only the CLI
> uses the configuration, and language bindings like Python and Ruby might
> get out of sync! (But note also that the Python bindings currently use
> .notmuch-config to find the database path, so maybe it's not a big deal.)
> If I could do whatever I wanted, every mailstore would get registered
> somehow and the URIs could use those registered names to specify what
> they're relative to: maybe using hostname, such as
> maildir://university-mail/some-mail-file, mbox://old-unix-system/some.mbox.
> Then changing these names in .notmuch-config would be fine. I just don't
> know how to pass that configuration information without an approach like in
> the past patch series.
>  > 2. Mail access information, i.e. open connections, etc. can only be
>> > stored in variables global to the mailstore code, and cannot be stored
>> > as private members of a mailstore object. This is more an aesthetic
>> > concern than a functional one.
>> >
>> > Anyhow, the following (enormous) patch series implement this design. I
>> > used uriparser as an external library to parse URIs. The API for this
>> > library is a little idiosyncratic. uriparser supports parsing Unicode
>> > URIs (strings of wchar_t), but I just used ASCII filenames because I
>> > think that's what comes out of Xapian.
>> Why use a library? Isn't it just a question of does the string contain
>> // and, if so, splitting it? I guess that // is a nice separator as I
>> think we can assume that a true path does not contain it (since a
>> filename cannot contain /).
> The URIs are true URIs. Filenames are provided by the "path" segment of the
> uri -- everything from the first slash after the hostname up to a ? for
> query arguments. My concern was that filenames could (in theory) contain #
> or ?, and in practice they contain : (maildir flags). I figured it was
> better to do it right.

This is similar to your question to Jamie:

>  1. Are URIs the way to specify individual messages, despite bremner's
>  concerns about too much of the API being strings? Is adding another library
>  is the easiest way to parse URIs?

In my opinion  the nice thing about using strings is that it does not require
any changes to the Xapian database to store them. I think using URIs may
not be best though as they seem to be annoying to parse (as filenames
can contain the same characters) and you seem to need to work around the
parser in some cases.

I wonder if the following would be practical: use // as the field

e.g. mbox://filename//start_of_message+length

I think 2 consecutive slashes // is about the only thing we can assume
is not in the path or filename. Since it is not in the filename I think
parsing should be trivial (thus avoiding the extra library).
Secondly, I would prefer to keep maildirs as just the bare file name: so
the existence of // can be the signal that there is some other
scheme. This is asymmetric, but is rather more backwardly compatible. 

I have read most of the patches and will send a couple of specific
comments but I completely agree with you that the first thing is to
decide the above.

Finally do note these are just my views and others may have very
different ideas!

Best wishes