Re: [notmuch] nested tag trees (was: Mail in git)

Subject: Re: [notmuch] nested tag trees (was: Mail in git)

Date: Thu, 18 Feb 2010 00:10:47 -0500

To: martin f krafft

Cc: notmuch

From: Ben Gamari


Excerpts from martin f krafft's message of Wed Feb 17 23:59:43 -0500 2010:
> also sprach Ben Gamari <bgamari@gmail.com> [2010.02.18.1744 +1300]:
> > I believe you would. The problem isn't the messages (well, that's
> > a problem too), it's the fact that the tree (e.g. tab) objects
> > which reference the messages are immutable (I believe). This
> > presents us with the difficult circumstance of being unable to
> > modify a tag after it has been created. Therefore, as far as I can
> > tell, we need to rewrite the tag's tree object whenever we add or
> > remove a message. This was the reason I suggested nesting tag
> > trees, although this only partially solves the issue.
> 
> You are absolutely right, and I think nesting tag trees is an
> interesting idea to pursue. It *would* make it impossible to ever
> check out the metatree into the filesystem, or rather result in
> subdirectories that the user shouldn't need to worry about.
> 
Yeah, this is a bit of a bummer. This is really a stretch, but I wonder
if the git folks would accept patches/minor database semantics changes
in the name of making git more flexible as a general purpose object
database. I really doubt it, but you never know.

> Instead of nested subtrees, think of 16 subtrees forming a level-1
> hash table, or 256 for level-2, which really *ought* to be enough.
> 
> Anyway, rewriting a tree object is pretty much exactly the same as
> removing a line (e.g. a message ID) from a file (e.g. a tag), as
> that file would have to be fully rewritten.
> 
This is very true, but exactly do you mean by this statement?

> > Yeah, I'm not sure how well this would scale on truly massive mail
> > stores.
> 
> The more I think about this, the more I want to implement this
> between evenless and Git, i.e. as a porcelain layer, since then
> I could also use it for vcs-home[0]. In fact, maybe one day we can
> store ~ and mail all in one Git repo, with different porcelains for
> different use-cases, and notmuch indexing it all anyway. ;)

It would be nice if git just didn't attach so many semantics to its
object types and left more up to the porcelain. Git is a fantastic
database, unfortunately it seems you need to work around a lot of VCS
behavior in order to make use of it in a non-VCS application. Attaching
less meaning to database objects would make things substantially easier.

> Let's continue the technical discussion on the Git list, okay?
> 

Yep. As soon as Majordomo sends me my confirmation.

Cheers,

- Ben

Thread: