Thanks for going through it, I know there's a lot to go through..

On Thu, Jun 28, 2012 at 4:45 PM, Mark Walters <markwalters1009@gmail.com> wrote:
I was thinking of just having one mail root and inside that there could
be maildirs and mboxes. Everything would still be relative to the root.

I'm hesitant to have directories that contain maildirs and mboxes. It should be possible to unambiguously distinguish between a maildir file and an mbox file (mboxes always start with "From ", no colon) but it sounds kind of fragile.

>  1. Are URIs the way to specify individual messages, despite bremner's
>  concerns about too much of the API being strings? Is adding another library
>  is the easiest way to parse URIs?

In my opinion  the nice thing about using strings is that it does not require
any changes to the Xapian database to store them. I think using URIs may
not be best though as they seem to be annoying to parse (as filenames
can contain the same characters) and you seem to need to work around the
parser in some cases.

I think that's more the fault of the parser than of the URIs. If glib came with a parser, that would be great. There aren't a lot of options for pure-C URI parsing. Besides uriparser, there's also some code in the W3C sample code library, but it looked like integrating it would be a pain so I let it go.

I wonder if the following would be practical: use // as the field
separator:

e.g. mbox://filename//start_of_message+length

I think 2 consecutive slashes // is about the only thing we can assume
is not in the path or filename. Since it is not in the filename I think
parsing should be trivial (thus avoiding the extra library).

Can you explain what you mean when you say that two consecutive slashes can't appear in a URL? Ordinary filesystem paths can contain them, and so can file: URLs. (I just looked up file:///home/ethan///////tmp and Firefox handled that OK.) I've sometimes seen machine-generated filenames with double slashes because that way you don't have to make sure the incoming filename was correctly terminated before adding another level.
 
Secondly, I would prefer to keep maildirs as just the bare file name: so
the existence of // can be the signal that there is some other
scheme. This is asymmetric, but is rather more backwardly compatible.

Based on your and Jani's reasoning, I did this. Revised patch series follows.

Ethan