Re: [notmuch] nested tag trees (was: Mail in git)

Subject: Re: [notmuch] nested tag trees (was: Mail in git)

Date: Wed, 17 Feb 2010 23:44:27 -0500

To: martin f krafft

Cc: notmuch

From: Ben Gamari


Excerpts from martin f krafft's message of Wed Feb 17 22:46:13 -0500 2010:
> You ought to have sent to the list, and I want to send mine there
> too, so please give permission.
> 
Oops! Sorry about that. Damn you sup.

> also sprach Ben Gamari <bgamari@gmail.com> [2010.02.18.1620 +1300]:
> > This is a very good point. From what I've read about the database
> > format, I can't think of any way that reverse dependencies could be
> > easily found, unfortunately. If there really is no way to do this, then
> > we could have a problem. I'm not sure rewriting tens of megabytes
> > everytime you receive a mail message is acceptable.
> 
> You would not need to do that, since the messages don't change, and
> thus their blobs remain the same.

I believe you would. The problem isn't the messages (well, that's a
problem too), it's the fact that
the tree (e.g. tab) objects which reference the messages are immutable
(I believe). This presents us with the difficult
circumstance of being unable to modify a tag after it has been created.
Therefore, as far as I can tell, we need to rewrite the tag's tree
object whenever we add or remove a message. This was the reason I
suggested nesting tag trees, although this only partially solves the
issue.

(Please correct me if I'm wrong about any/all of the above)

> 
> However, for every manipulation of a message, you would need to
> iterate *all* tag trees (O(n)) and update the ones referencing the
> message (also O(n)).
> 
This is definitely an issue.

> This can probably be further optimised, but still: it's not quite as
> nice as enumerating all parents of a message in O(1) time (which
> would still result in O(m×n)).
> 
Yeah, I'm not sure how well this would scale on truly massive
mail stores.

Cheers,

- Ben

Thread: