Re: inbox-update: new competition of notmuch-lore

Subject: Re: inbox-update: new competition of notmuch-lore

Date: Mon, 17 Apr 2023 06:30:18 -0600

To: Michael J Gruber, Felipe Contreras

Cc:, Tobias Waldekranz

From: Felipe Contreras

Michael J Gruber wrote:
> > I'm moving from mbsync to public-inbox and I find there aren't many tools to
> > make it work with notmuch.
> Looking at that, too.
> > I gave a try to notmuch-lore [1] but I found it too slow and had a couple of
> > issues.
> >
> > So I wrote my own script to convert public-inbox mailing lists to Maildir
> > format: notmuch-tools/inbox-update [2].
> >
> > It's much faster at the initial clone, it deals with deleted mails, and YAML is
> > a much better configuration format.
> Looking at both scripts: Is the speed-up mainly due to `git cat-file`
> vs. `git show`?

My guess is that it's due to using `git cat-file` in batch mode, so it's called
only once, instead of thousands of times.

Presumably this can be done in notmuch-lore as well, with something like:

  git rev-list | sed -e /$/:m/ | git cat-file --batch

But this still has the issue that some commits remove mail, don't add.

> > Also, you can configure which epochs you want to fetch (notmuch-lore fetches
> > all of them).
> >
> > One thing it doesn't yet do is trim the repository once the mails have been
> > converted, but that's probably easy to add later on.
> What kind of trimming are you thinking about here? Partial history?

Same as notmuch-lore does: just the last commit.

Once the mails have been extracted there's no need for those commits.

> I guess this shows that public-inbox's repo format is simply not the
> best choice for the purpose of mail readers. It is optimised for other
> uses, and I always wondered why they use a non-bare repo at all. That
> single file path m at the root creates absolutely meaningless diffs.
> And the commit message doubles the info which is present in the blob.
> notes-ref could have served better for inspiration of public-inbox.
> (Barking up the wrong tree, I know.)

I don't know if there's a better format, git stores shapshots anyway, so as
long as the information is retrivable in some way, I think that' fine.

And I clone the public-inbox repositories as bare (mirror, actually), that's
something for the client to decide.
> There are even tools in the public-inbox eco system which feed that
> info into a xapian db, though not notmuch-like, as if notmuch hadn't
> existed already.
> What I'm dreaming of is a notmuch "storage backend" which is git
> object db based rather than maildir based, and compatible with
> public-inbox (at least with the use case, i.e. v3 or v4...). I mean -
> why do we need a checkout of basically immutable files which are
> stored in blobs already, just so that notmuch can index them?

Yeap, that's exactly what I want as well.

It should not be that difficult to decouple notmuch from physical files and
feed some virtual content.

> We need them for the MUAs, I know, and we would need a solution for
> them, too. Or simply a tree in public-inbox which allows clients to
> use a mere checkout ...

99% of the time the content is not needed for the MUAs. So perhaps there could
be a way to request the body of the message through libnotmuch, and some
provider of virtual messages retrives it on demand.

Maildir seems like a cumbersome intermediary to me, at the moment.

Felipe Contreras
