Re: On disk tag storage format

Subject: Re: On disk tag storage format

Date: Thu, 29 Nov 2012 20:34:50 +0100

To: notmuch mailing list

Cc:

From: Eirik Byrkjeflot Anonsen


David Bremner <david@tethera.net> writes:

> Austin outlined on IRC a way of representing tags on disk as hardlinks
> to messages. In order to make the discussion more concrete, I wrote a
> prototype in python to dump the notmuch database to this format. On my
> 250k messages, this creates 40k new hardlinks, and uses about 5M of
> diskspace. The dump process takes about 20s on
> my core i7 machine.  With symbolic links, the same database takes about
> 150M of disk space; this isn't great but it isn't unbearable either.

And eating 40k inodes, I suppose.  Which may matter to some systems.
(Hardlinks do not use extra inodes, as they are just directory entries
pointing to already existing inodes).

Of course, the space usage also depends on the file system, as e.g. ext2
would use 1 complete block (typically 4kiB) to store the file name
pointed to per symlink.  ReiserFS would probably use 5M for the
directory entries and another 5M for the symlink data (wild guess).

eirik

Thread: