Excerpts from martin f krafft's message of Tue Feb 16 20:21:01 -0500 2010: > What I am wondering is if (explicit) tags couldn't be represented as > tree-objects with this. > > evenless-link — link a message object with a tree object > evenless–unlink – unlink a message object from tree object > [replaces evenless-unlink] I was actually wondering this very thing. I'd just be worried about tags with large numbers of messages (presumably we would need an All tag, that would contain a reference to every known message). It seems like the simple act of adding a message to the repository could turn into an extremely expensive operation. Moreover, deleting a message could also be quite expensive as this will require rewriting all of the tags that reference it. Surely, we would need to batch these sort of operations to avoid disasterous performance. However, even with batching, it seems we would face some pretty serious scalability issues. I think if we were to implement tag storage in trees, we'd need to use a multi-level tree. This way we could avoid rewriting a tree object containing all of the tag's messages on every change. I apologize if this was already obvious to everyone but me. > > messages would then be deleted whenever using git-gc. > > No idea how this would sync if we don't keep ancestry. Otoh, it > would probably not be very expensive to do just that. I think that keeping the ancestry would be quite important and would come with relatively low overhead given the correct dereferencing of data structures. > > notmuch would then only search and provide the hash ID(s); tags > would be a function of storage. > > Is it possible to find out all trees that reference a given object > with Git in constant or sub-linear time? > I don't believe so. I think this is one of the reasons why git gc is so expensive. - Ben