Re: [notmuch] Mail in git

Subject: Re: [notmuch] Mail in git

Date: Wed, 17 Feb 2010 21:07:28 +1100

To: Ben Gamari, notmuch

Cc:

From: Stewart Smith


On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith <stewart@flamingspork.com> wrote:
> Using fast-import is interesting. Does it update the working tree? The
> big thing I wanted to avoid was creating a working tree (another million
> inodes being created is not ever what I need)
> 
> Also interesting is the mention of creating packs on the fly... this
> could save the time in first writing the object and then packing it (as
> my script does).
> 
> I'm going to play with this....

and I did.

good news... on my mailstore (which, as I've previously mentioned, takes
about 10 minutes to run 'du' over, about the same time as 'notmuch new'
takes):

using the (attached) evenless.pl to create a single commit with
everything in it:

$ du -sh .git
3.4G	.git

Down from a whopping 14-15GB!!!

My previous effort (git-write-object, create pack every 1000 messages,
rinse, repeat) took all night and got to 3.7GB.

This took only 108 minutes.

In both cases, i was creating the repository on another spindle (USB2.0
disk attached to my laptop).

git-ls-tree and git-cat-file both work for listing and getting objects.

The next thing to think about is adding objects as they come
in... creating a new commit with just an added file should be pretty
simple and easy... but this means we get to keep a "revision history" of
the mailstore, which is *possibly* not ideal in terms of storage
efficiency (i'll do a trial with mine of doing one message at a time and
seeing what the end size is).

however... commit per added mail (or mails) does give us the advantage
of a really well documented and tested backup system :)

Deleting could be hard.. if we actually want the objects to go away in a
"permanent" way (not just no longer be referenced).

for the stats nerds:

$ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX

git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:     785000
Total objects:       781813 (     79023 duplicates                  )
      blobs  :       781363 (     79023 duplicates     708627 deltas)
      trees  :          449 (         0 duplicates          0 deltas)
      commits:            1 (         0 duplicates          0 deltas)
      tags   :            0 (         0 duplicates          0 deltas)
Total branches:           1 (         1 loads     )
      marks:        1048576 (    860386 unique    )
      atoms:         860557
Memory total:        182780 KiB
       pools:        152116 KiB
     objects:         30664 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 8589934592
pack_report: pack_used_ctr            =          1
pack_report: pack_mmap_calls          =          1
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =  388496447 /  388496447
---------------------------------------------------------------------


real	107m43.130s
user	45m25.430s
sys	2m49.440s





-- 
Stewart Smith

Thread: