Re: experimental logging branch

Subject: Re: experimental logging branch

Date: Sun, 11 Dec 2011 20:12:13 -0400

To: Notmuch Mail

Cc:

From: David Bremner


On Sat, 10 Dec 2011 23:27:02 -0400, David Bremner <david@tethera.net> wrote:
> This uses the jlog library (http://labs.omniti.com/labs/jlog)
> to atomically log messages in pub-sub model.

Some more explanations. 

Part 1: pub-what?
=================

 What is pub-sub? Conceptually you can think of
it as a set of queues where each "published" message is magically
replicated and put in the queue of each "subscriber". Of course in
practice one only needs one queue, and to keep track of how far each
subscriber has read.  So, we have a queue, and one head pointer per
subscriber.  We can discard anything in the queue that all of the
subscribers have read past.

In jlog this data structure is on disk, which is why it is called
"durable". This means there are no sockets to communicate between the
publisher (notmuch) and the subscribers (e.g. a proposed tag synching
tool, described more below).  The interact via a directory (currently
under .notmuch). Because the datastructure on disk is not completely
trivial (not that fancy either, but more than a stream of bytes) both
writers and readers need to use the jlog library to interact with the
queue.  "notmuch log" is one such reader.

I'm not that invested in jlog, but I looked around and didn't find any
other similar solutions that had some atomicity guarantees without some
kind of broker (read yet another daemon running on the machine).  

> On this branch you can enable logging of tagging operations by 
> 
>    notmuch config set log.subscribers 'name1;name2;name3'

The command "notmuch log" lets one read the queue from the shell.
"notmuch log name1" dumps any messages (only the string content; there
are timestamps but these are currently ignored) queued for "name1" to
stdout. So one can interact with this queue without learning about the
jlog api (or more precisely, without copy-pasting the example programs
From the wiki like I did).

Part 2: Ok, but what is it good for?
====================================

OK, so there is this tool, but why should we bother?  I think tag
synchronization is one of the big missing pieces for notmuch (probably
because Carl only reads mail on one computer ;) ). There are various
hacks, but they are all based on dump/restore.  nmbug only manages to
have (mostly) acceptable performance because the query of "tags starting
with notmuch::" (done in a hacky way) restricts the output to manageable
levels.  I think what we need is a way to to update incrementally, and
obvious way to do this for tags is to keep track of additions and
deletions, and maintain a "shadow" of the database on disk in some form
more amenable to synchronization. This could be a directory/file tree
like the initial versions of nmbug, or some slightly fancier thing like
the bare git repo used by current versions of nmbug. With the jlog
patches to notmuch, one or more scripts could run (in cron, or perhaps
using something like inotify) to treat the log of tagging operations as
essentially a patch to update the "shadow tag database".  In my case I
would probably want two subscribers, one for my whole tag database, and
one to update the set published in nmbug.  

A more wild idea would be to use the queue to help resolve contention
for write access to the Xapian database. Clients would write into a
queue, and notmuch would read operations to perform out of the queue. To
be honest, I'm not sure this is really better than just having clients
use locking and blocking/retrying.  

Part 3: Couldn't we do this with hooks?
=======================================

Conceptually, yes. But there are a few things to figure out:

1) hooks are a CLI feature, not a library feature. Do we want the
   library to support something like hooks?

2) The cost of an exec per elementary tagging operation is quite high;
   maybe some kind of batching could help with this.

3) Atomicity/locking would need to be dealt with by each script.  For
   example git update-index, used by nmbug (and git add) will fail if
   some other operation is in progress. But having a hook block sounds a
   bit nightmarish.

None of this stuff is my area of expertise; maybe some of you have
clearer ideas about how this could/should be handled.

d





part-000.sig (application/pgp-signature)

Thread: