Re: Info about notmuch database

Subject: Re: Info about notmuch database

Date: Thu, 5 Jan 2012 16:38:07 +0100

To: notmuch@notmuchmail.org

Cc:

From: boyska


On Thu, Jan 05, 2012 at 04:04:22PM +0100, Thomas Jost wrote:
> On Wed, 04 Jan 2012 15:49:19 +0000, boyska <piuttosto@logorroici.org> wrote:
> > Hello!
> > I like notmuch a lot, so I'm writing a (conceptually) similar software
> > about addressbook: it will scan all your emails, storing email 
> > addresses
> > in a xapian database (you can think of it as little brother database[1] 
> > on
> > steroids)
> > The part that I'd like to re-implement is "notmuch new": it seems that
> > in the xapian db there is not only informations about each mail, but
> > also the mtime of each directory. My impression is this being 
> > "chaotic",
> > but probably I am just missing the point.
> > 
> > So, here's the question: how is the db "structured"? is there any
> > documentation to look at?
> > 
> > [1] http://www.spinnaker.de/lbdb/
> > 
> > -- 
> > boyska
> > GPG: 0x520CE393
> 
> There's a description of the DB "schema" in lib/database.cc in the
> notmuch source code. But you may also consider just using libnotmuch
> instead, if that's enough for what you want to do.

thanks, found it, much clearer now.
But I really can't understand why not just putting these things on a
separate file :) atomic consistency issues?

> Also: why Xapian? I'm already using something similar I wrote with
> Python, storing everything in a dictionary, using Pickle to save that to
> disk: 162 lines of code and 45 kb of data are enough to store my
> addressbook and have completion in Emacs...

dictionary approach is fine to manage a "manual" addressbook, where you
store addresses. But what I want is an _automatic_ addressbook, like the
lbdb one, which just indexes all seen emails.
The grep approach is better from this point of view, but still not
advanced enough for me.
For example, I'd like to store "cooccorrences": if some email is used in
the same mail of some other, then it must contain a relationship; for
example, your email should be correlated to the notmuch mailinglist,
because you wrote to it. (they should be 0-weighted xapian term).  Also,
I want to give more importance to email addresses which are frequently
seen, and much less to not-so-frequently seen. Xapian makes these really
easy, so the question is "why not using it?" ;)

Thread: