Re: On disk tag storage format

Subject: Re: On disk tag storage format

Date: Wed, 20 Feb 2013 21:29:30 -0400

To: notmuch mailing list

Cc:

From: David Bremner


David Bremner <david@tethera.net> writes:

> Austin outlined on IRC a way of representing tags on disk as hardlinks
> to messages. In order to make the discussion more concrete, I wrote a
> prototype in python to dump the notmuch database to this format. On my
> 250k messages, this creates 40k new hardlinks, and uses about 5M of
> diskspace. The dump process takes about 20s on
> my core i7 machine.  With symbolic links, the same database takes about
> 150M of disk space; this isn't great but it isn't unbearable either.
>

I've being playing a bit with this script and it seems more or less
usable as a way of mirroring the notmuch tag database to a link farm.

It's a bit faster than my current dump/restore based approach, although
if you want to keep the results in a git repository then it takes up
more space. Of course the bonus with this approach is that it creates
"virtual" maildirs for each tag that can be browsed with the maildir
client of choice.

The current default is to use some mix of hard and symbolic links to try
to balance the space consumed in a git repo versus the inode
consumption/performance issues of using too many symlinks.

It's still a prototype, and there is not much error checking, and there
are certain issues not dealt with at all (the ones I thought about are
commented).


Thread: