Hi Stewart, On Mon, 15 Feb 2010 11:29:14 +1100, Stewart Smith <stewart@flamingspork.com> wrote: > Which goes from a 15GB Maildir to a 3.7GB git repo. That's quite interesting ratio. I've tried a plain git add and git gc on my mail store and the result was a repo of approximately 50% of mail store size. Do you think that this difference might be caused by the way you created the packs? > > The algorithm of evenless.pl is basically: > 1 get next directory entry > 2 if is directory, recurse into it > 3 write item to git (git hash-object -w) > 4 add item to tree object > 5 if number of items written = 1000 > 5.1 make pack of last 1000 items > 6 goto 1 So it seems that you have all you mails in a single tree. How long it takes to caculate difference of two trees (git diff-tree --name-status)? This operation will be needed by "notmuch new" to determine which files/blobs to index. I suppose it will be better if mail blobs are stored in subtrees. If a subtree is not changed git doesn't need to descend to it because it has the same sha1. I think that storing mails in a similar structure as in .git/objects (i.e. 256 subdirectories based on the first sha1 byte and file names based on the last 39 sha1 bytes) would be reasonable. > Next step? > > Make notmuch be able to read mail out of it and add it to an index > (oh, and some kind of verification and error checking about creating > the git repo). Besides using git to compact the size of mail store, another feature that cames with git for free is synchronization. For this to work, you only need to store tags in the repo. What might work is to store tags in files named <mail-name>.tags. The tags would be stored in the files alphabetically, one tag per line. I guess, that this way makes it easy to merge tags during synchronization even without writing custom git merge driver. Onother point that must be solved if we would like to use git with notmuch is the license problem. As it was pointed out by Carl in another thread, Git is licensed under GPLv2 only whereas notmuch under GPLv3 and these licences are incompatible. So I think we will need some kind of hooks in notmuch from which external programs (git) will be called. Cheers, Michal