Re: [PATCH 2/5] lib: Add a MTIME value to every mail document

Subject: Re: [PATCH 2/5] lib: Add a MTIME value to every mail document

Date: Wed, 21 Dec 2011 02:00:53 +0100

To: Austin Clements

Cc: notmuch@notmuchmail.org

From: Thomas Jost


On Wed, 14 Dec 2011 19:45:07 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> A few minor comments below.
> 
> At a higher level, I'm curious what the tag synchronization protocol
> you're building on top of this is.  I can't think of one that doesn't
> have race conditions, but maybe I'm not thinking about it right.

The approach I've used is quite different from what you described in
id:"20111219194821.GA10376@mit.edu". I don't directly sync host A to
host B but I use a server in the middle. (A is my laptop --> not always
on, B is my work PC --> turned off when I'm out of office, so a direct
sync would be harder to do).

My nm-sync script is written in Python 2 (2.7, may work with 2.6) and is
present on both my PCs and on my server. It can operate in two modes :
client (when run from one of my PCs) or server (called *from the client*
through ssh, running on my server).

When running in server mode, the script manipulates a small DB stored as
a Python dictionary (and stored on disk with the pickle module). It does
not even need notmuch to be installed on the server. Here is what this
DB looks like:
  {
    "lastseen": {
      "pc_A": 1324428029,
      "pc_B": 1323952028
    },
    "messages": {
      "msgid_001": (mtime, tag1, tag2, ..., tagN),
      "msgid_002": (mtime, tag1, tag2, ..., tagM),
      ...
    }
  }

So when running the client, here is what happens:
1. client starts a subprocess: "ssh myserver ~/nm-sync server"
2. client and server check that their sha1sum match (to avoid version
   mismatch)
3. client identifies itself with its hostname ("pc_A" in the example
   above), server replies with its "lastseen" value and updates its in
   the DB
4. server sends to client messages with mtime > lastseen (msgid + mtime
   + tags), client updates the notmuch DB with these values
5. client queries the notmuch DB for messages with mtime > lastseen and
   sends them (msgid + mtime + tags) to the server, which stores them in
   the DB
6. cleanup: server removes messages with mtime < min(lastseen) from its
   DB

So basically this approach assumes that all clocks are synchronized
(everyone uses ntp, right?...) and does not even try to detect
conflicts: if a message has been modified both locally and remotely,
then the local version will be overwritten by the remote one, period. It
should also work with more than 2 hosts (but not tested yet). No sync
data is kept in the notmuch DB.

Right now all of this fits in about 250 lines of Python (could be made
shorter) and works quite well for me. I'll put it online after doing
some cleanup.


> Quoth Thomas Jost on Dec 13 at  6:11 pm:
> > This is a time_t value, similar to the message date (TIMESTAMP). It is first set
> > when the message is added to the database, and is then updated every time a tag
> > is added or removed. It can thus be used for doing incremental dumps of the
> > database or for synchronizing it between several computers.
> > 
> > This value can be read freely (with notmuch_message_get_mtime()) but for now it
> > can't be set to an arbitrary value: it can only be set to "now" when updated.
> > There's no specific reason for this except that I don't really see a real use
> > case for setting it to an arbitrary value.
> > ---
> >  lib/database.cc       |    7 ++++++-
> >  lib/message.cc        |   32 ++++++++++++++++++++++++++++++++
> >  lib/notmuch-private.h |    6 +++++-
> >  lib/notmuch.h         |    4 ++++
> >  4 files changed, 47 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/database.cc b/lib/database.cc
> > index 2025189..6dc6f73 100644
> > --- a/lib/database.cc
> > +++ b/lib/database.cc
> > @@ -81,7 +81,7 @@ typedef struct {
> >   *		        STRING is the name of a file within that
> >   *		        directory for this mail message.
> >   *
> > - *    A mail document also has four values:
> > + *    A mail document also has five values:
> >   *
> >   *	TIMESTAMP:	The time_t value corresponding to the message's
> >   *			Date header.
> > @@ -92,6 +92,9 @@ typedef struct {
> >   *
> >   *	SUBJECT:	The value of the "Subject" header
> >   *
> > + *	MTIME:		The time_t value corresponding to the last time
> > + *			a tag was added or removed on the message.
> > + *
> >   * In addition, terms from the content of the message are added with
> >   * "from", "to", "attachment", and "subject" prefixes for use by the
> >   * user in searching. Similarly, terms from the path of the mail
> > @@ -1735,6 +1738,8 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
> >  	    date = notmuch_message_file_get_header (message_file, "date");
> >  	    _notmuch_message_set_header_values (message, date, from, subject);
> >  
> > +            _notmuch_message_update_mtime (message);
> 
> Indentation.

Fixed, thanks.

> 
> > +
> >  	    _notmuch_message_index_file (message, filename);
> >  	} else {
> >  	    ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
> > diff --git a/lib/message.cc b/lib/message.cc
> > index 0075425..0c98589 100644
> > --- a/lib/message.cc
> > +++ b/lib/message.cc
> > @@ -830,6 +830,34 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
> >      message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
> >  }
> >  
> > +/* Get the message mtime, i.e. when it was added or the last time a tag was
> > + * added/removed. */
> > +time_t
> > +notmuch_message_get_mtime (notmuch_message_t *message)
> > +{
> > +    std::string value;
> > +
> > +    try {
> > +	value = message->doc.get_value (NOTMUCH_VALUE_MTIME);
> > +    } catch (Xapian::Error &error) {
> > +	INTERNAL_ERROR ("Failed to read mtime value from document.");
> > +	return 0;
> > +    }
> 
> For compatibility, this should handle the case when
> NOTMUCH_VALUE_MTIME is missing, probably by just returning 0.  As it
> is, value will be an empty string and sortable_unserialise is
> undefined on strings that weren't produced by sortable_serialise.

Right. I think I rebuilt my DB just after implementing this, which
explains why I did not notice that myself. Thanks!

> > +
> > +    return Xapian::sortable_unserialise (value);
> > +}
> > +
> > +/* Set the message mtime to "now". */
> > +void
> > +_notmuch_message_update_mtime (notmuch_message_t *message)
> > +{
> > +    time_t time_value;
> > +
> > +    time_value = time (NULL);
> > +    message->doc.add_value (NOTMUCH_VALUE_MTIME,
> > +                            Xapian::sortable_serialise (time_value));
> 
> Indentation.

Noted too. It's really time I start using dtrt-indent.

> 
> > +}
> > +
> >  /* Synchronize changes made to message->doc out into the database. */
> >  void
> >  _notmuch_message_sync (notmuch_message_t *message)
> > @@ -994,6 +1022,8 @@ notmuch_message_add_tag (notmuch_message_t *message, const char *tag)
> >  			private_status);
> >      }
> >  
> > +    _notmuch_message_update_mtime (message);
> > +
> >      if (! message->frozen)
> >  	_notmuch_message_sync (message);
> >  
> > @@ -1022,6 +1052,8 @@ notmuch_message_remove_tag (notmuch_message_t *message, const char *tag)
> >  			private_status);
> >      }
> >  
> > +    _notmuch_message_update_mtime (message);
> > +
> >      if (! message->frozen)
> >  	_notmuch_message_sync (message);
> >  
> > diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> > index 60a932f..9859872 100644
> > --- a/lib/notmuch-private.h
> > +++ b/lib/notmuch-private.h
> > @@ -95,7 +95,8 @@ typedef enum {
> >      NOTMUCH_VALUE_TIMESTAMP = 0,
> >      NOTMUCH_VALUE_MESSAGE_ID,
> >      NOTMUCH_VALUE_FROM,
> > -    NOTMUCH_VALUE_SUBJECT
> > +    NOTMUCH_VALUE_SUBJECT,
> > +    NOTMUCH_VALUE_MTIME
> >  } notmuch_value_t;
> >  
> >  /* Xapian (with flint backend) complains if we provide a term longer
> > @@ -276,6 +277,9 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
> >  				    const char *from,
> >  				    const char *subject);
> >  void
> > +_notmuch_message_update_mtime (notmuch_message_t *message);
> > +
> > +void
> >  _notmuch_message_sync (notmuch_message_t *message);
> >  
> >  notmuch_status_t
> > diff --git a/lib/notmuch.h b/lib/notmuch.h
> > index 9f23a10..643ebce 100644
> > --- a/lib/notmuch.h
> > +++ b/lib/notmuch.h
> > @@ -910,6 +910,10 @@ notmuch_message_set_flag (notmuch_message_t *message,
> >  time_t
> >  notmuch_message_get_date  (notmuch_message_t *message);
> >  
> > +/* Get the mtime of 'message' as a time_t value. */
> > +time_t
> > +notmuch_message_get_mtime (notmuch_message_t *message);
> > +
> >  /* Get the value of the specified header from 'message'.
> >   *
> >   * The value will be read from the actual message file, not from the

-- 
Thomas/Schnouki
part-000.sig (application/pgp-signature)

Thread: