On Sun, 01 Jul 2012, Ethan <ethan.glasser.camp@gmail.com> wrote: > Thanks for going through it, I know there's a lot to go through.. > > On Thu, Jun 28, 2012 at 4:45 PM, Mark Walters <markwalters1009@gmail.com>wrote: > >> I was thinking of just having one mail root and inside that there could >> be maildirs and mboxes. Everything would still be relative to the root. >> > > I'm hesitant to have directories that contain maildirs and mboxes. It > should be possible to unambiguously distinguish between a maildir file and > an mbox file (mboxes always start with "From ", no colon) but it sounds > kind of fragile. Well I was thinking you would still need to add specific sub-directories of db_path that might contain mboxes. >> 1. Are URIs the way to specify individual messages, despite bremner's >> > concerns about too much of the API being strings? Is adding another >> library >> > is the easiest way to parse URIs? >> >> In my opinion the nice thing about using strings is that it does not >> require >> any changes to the Xapian database to store them. I think using URIs may >> not be best though as they seem to be annoying to parse (as filenames >> can contain the same characters) and you seem to need to work around the >> parser in some cases. >> > > I think that's more the fault of the parser than of the URIs. If glib came > with a parser, that would be great. There aren't a lot of options for > pure-C URI parsing. Besides uriparser, there's also some code in the W3C > sample code library, but it looked like integrating it would be a pain so I > let it go. > > I wonder if the following would be practical: use // as the field >> separator: >> >> e.g. mbox://filename//start_of_message+length >> >> I think 2 consecutive slashes // is about the only thing we can assume >> is not in the path or filename. Since it is not in the filename I think >> parsing should be trivial (thus avoiding the extra library). >> > > Can you explain what you mean when you say that two consecutive slashes > can't appear in a URL? Ordinary filesystem paths can contain them, and so > can file: URLs. (I just looked up file:///home/ethan///////tmp and Firefox > handled that OK.) I've sometimes seen machine-generated filenames with > double slashes because that way you don't have to make sure the incoming > filename was correctly terminated before adding another level. Nothing outside notmuch (i.e. other applications creating arbitrary filenames etc) can make notmuch store a // as part of a path so if we ever do store them in the database it's our own fault. In particular notmuch can avoid them easily in that they cannot occur in a filename. >> Secondly, I would prefer to keep maildirs as just the bare file name: so >> the existence of // can be the signal that there is some other >> scheme. This is asymmetric, but is rather more backwardly compatible. >> > > Based on your and Jani's reasoning, I did this. Revised patch series > follows. I will try and look at that now. Best wishes Mark