Re: [PATCH 0/5] Store message modification times in the DB

Subject: Re: [PATCH 0/5] Store message modification times in the DB

Date: Mon, 19 Dec 2011 14:48:21 -0500

To: David Edmondson

Cc: notmuch@notmuchmail.org

From: Austin Clements


Quoth David Edmondson on Dec 19 at  4:34 pm:
> On Tue, 13 Dec 2011 18:11:40 +0100, Thomas Jost <schnouki@schnouki.net> wrote:
> > This is a patch series I've been working on for some time in order to be
> > able to sync my tags on several computers. I'm posting it now, but
> > please consider it as a RFC rather than something that is ready to be
> > pushed.
> > 
> > The basic idea is to the last time each message was modified, i.e. "the
> > message was added to the DB", "a tag was added" or "a tag was removed".
> 
> Thomas, this is interesting. Do you have a (back of the envelope?)
> design for how you will use this information to implement tag sync?
> 
> My gut feeling is that we need a log of when a change occurred rather
> than the last modification time, but I haven't really thought that all
> through properly.

Here are sketches for two sync algorithms with different properties.
I haven't proven these to be correct, but I believe they are.  In
both, R is the remote host and L is the local host.  They're both
one-way (they only update tags on L), but should be symmetrically
stable.


== Two-way "merge" from host R to host L ==

Per-host state:
- last_mtime: Map from remote hosts to last sync mtime

new_mtime = last_mtime[R]
For msgid, mtime, tags in messages on host R with mtime >= last_mtime[R]:
  If mtime > local mtime of msgid:
    Set local tags of msgid to tags
  new_mtime = max(new_mtime, mtime)
last_mtime[R] = new_mtime

This has the advantage of keeping very little state, but the
synchronization is also quite primitive.  If two hosts change a
message's tags in different ways between synchronizations, the more
recent of the two will override the full set of tags on that message.
This does not strictly require tombstones, though if you make a tag
change and then delete the message before a sync, the tag change will
be lost without some record of that state.  Also, this obviously
depends heavily on synchronized clocks.


== Three-way merge from host R to host L ==

Per-host state:
- last_mtime: Map from remote hosts to last sync mtime
- last_sync: Map from remote hosts to the tag database as of the last sync

new_mtime = last_mtime[R]
for msgid, mtime, r_tags in messages on host R with mtime >= last_mtime[R]:
  my_tags = local tags of msgid
  last_tags = last_sync[R][msgid]
  for each tag that differs between my_tags and r_tags:
    if tag is in last_tags: remove tag locally
    else: add tag locally
  last_sync[R][msgid] = tags
  new_mtime = max(new_mtime, mtime)
Delete stale messages from last_sync[R] (using tombstones or something)
last_mtime[R] = new_mtime

This protocol requires significantly more state, but can also
reconstruct per-tag changes.  Conflict resolution is equivalent to
what git would do and is based solely on the current local and remote
state and the common ancestor state.  This can lead to unintuitive
results if a tag on a message has gone through multiple changes on
both hosts since the last sync (though, I argue, there are no
intuitive results in such situations).  Tombstones are only required
to garbage collect sync state (and other techniques could be used for
that).  This also does not depend on time synchronization (though,
like any mtime solution, it does depend on mtime monotonicity).  The
algorithm would work equally well with sequence numbers.


I tried coming up with a third algorithm that used mtimes to resolve
tagging conflicts, but without per-tag mtimes it degenerated into the
first algorithm.

Thread: