On Thu, 15 Dec 2011 23:07:22 -0500, Austin Clements <amdragon@MIT.EDU> wrote: > Quoth David Bremner on Dec 15 at 10:09 pm: > > The trouble with this approach is that the OS doesn't have to flush > logfile to the disk platters in any particular order relative to the > updates to Xapian. So, after someone trips over your plug, you could > come back with Xapian saying you have 500 log entries when your > logfile comes back with only 20. The only way I know of to fix this > is to fsync after the logfile write, which would obviously have > performance issues. But maybe there are cleverer ways? What about just declaring the log invalid in this case and forcing a "slow-sync"? It seems it should be no harder to detect the log being behind xapian than it would be to detect it being ahead. Another idea would be to replace logging with mkdir(2) and creat(2); I made some experiments in branch 'tree-dump' in repo git://pivot.cs.unb.ca/notmuch This generates a tree of empty files in the style of nmbug (which an extra layer of directories at the to help prevent file system explosion). It isn't super fast as a way to dump (probably at least 10x slower than the file based methods). On the other hand, on this machine (an i7 950 with a spinning disk) it takes about 1 ms per tag to write (i.e. 175k tags take about 160s). It is completely IO bound, so I would expect it be faster on SSD. I am running lvm on top of dm-crypt. The more worrying part is disk usage; the tag tree for 200k messages uses 400k inodes, and 836M of apparent disk usage (according to du) the same tags in "sup" format take 11M. Maybe this could be usefull if combined with some scheme to only dump tags not covered by maildir (for those using maildir flag synching already) d