On Sat, 10 Dec 2011 23:27:02 -0400, David Bremner <david@tethera.net> wrote: > This uses the jlog library (http://labs.omniti.com/labs/jlog) > to atomically log messages in pub-sub model. Some more explanations. Part 1: pub-what? ================= What is pub-sub? Conceptually you can think of it as a set of queues where each "published" message is magically replicated and put in the queue of each "subscriber". Of course in practice one only needs one queue, and to keep track of how far each subscriber has read. So, we have a queue, and one head pointer per subscriber. We can discard anything in the queue that all of the subscribers have read past. In jlog this data structure is on disk, which is why it is called "durable". This means there are no sockets to communicate between the publisher (notmuch) and the subscribers (e.g. a proposed tag synching tool, described more below). The interact via a directory (currently under .notmuch). Because the datastructure on disk is not completely trivial (not that fancy either, but more than a stream of bytes) both writers and readers need to use the jlog library to interact with the queue. "notmuch log" is one such reader. I'm not that invested in jlog, but I looked around and didn't find any other similar solutions that had some atomicity guarantees without some kind of broker (read yet another daemon running on the machine). > On this branch you can enable logging of tagging operations by > > notmuch config set log.subscribers 'name1;name2;name3' The command "notmuch log" lets one read the queue from the shell. "notmuch log name1" dumps any messages (only the string content; there are timestamps but these are currently ignored) queued for "name1" to stdout. So one can interact with this queue without learning about the jlog api (or more precisely, without copy-pasting the example programs From the wiki like I did). Part 2: Ok, but what is it good for? ==================================== OK, so there is this tool, but why should we bother? I think tag synchronization is one of the big missing pieces for notmuch (probably because Carl only reads mail on one computer ;) ). There are various hacks, but they are all based on dump/restore. nmbug only manages to have (mostly) acceptable performance because the query of "tags starting with notmuch::" (done in a hacky way) restricts the output to manageable levels. I think what we need is a way to to update incrementally, and obvious way to do this for tags is to keep track of additions and deletions, and maintain a "shadow" of the database on disk in some form more amenable to synchronization. This could be a directory/file tree like the initial versions of nmbug, or some slightly fancier thing like the bare git repo used by current versions of nmbug. With the jlog patches to notmuch, one or more scripts could run (in cron, or perhaps using something like inotify) to treat the log of tagging operations as essentially a patch to update the "shadow tag database". In my case I would probably want two subscribers, one for my whole tag database, and one to update the set published in nmbug. A more wild idea would be to use the queue to help resolve contention for write access to the Xapian database. Clients would write into a queue, and notmuch would read operations to perform out of the queue. To be honest, I'm not sure this is really better than just having clients use locking and blocking/retrying. Part 3: Couldn't we do this with hooks? ======================================= Conceptually, yes. But there are a few things to figure out: 1) hooks are a CLI feature, not a library feature. Do we want the library to support something like hooks? 2) The cost of an exec per elementary tagging operation is quite high; maybe some kind of batching could help with this. 3) Atomicity/locking would need to be dealt with by each script. For example git update-index, used by nmbug (and git add) will fail if some other operation is in progress. But having a hook block sounds a bit nightmarish. None of this stuff is my area of expertise; maybe some of you have clearer ideas about how this could/should be handled. d