Re: [notmuch] notmuch new: Memory problem

Subject: Re: [notmuch] notmuch new: Memory problem

Date: Mon, 23 Nov 2009 17:26:41 +0100

To: Carl Worth

Cc: notmuch@notmuchmail.org

From: Dominik Epple


Hi,

2009/11/20 Carl Worth <cworth@cworth.org>:
> On Fri, 20 Nov 2009 09:56:50 +0100, Dominik Epple <dominik.epple@googlemail.com> wrote:
>> Is there a problem with the number of my mails? I currently have over
>> 40.000 Mails... they live currently in mbox files, I created a Maildir
>> with mb2md-3.20.pl.
>
> I'm suspecting that you have some big files in there, (such as indexes
> from some other mail program). We had code in notmuch to detect and
> ignore these, but a recent bug had broken that.
>
> I just fixed this code as of the below commit. So please update and try
> again and let us know if things work any better.

Ok, one of the problems seems to be solved. One can learn from the
info: output that the code actually ignores non-email data. These
files are small and fragments of real mail. Obviously the mb2md code
made errors there.

But I run in a different issue. I have a lot of files in the Maildir
which contain base64 encoded binary data. (Some remote site sends my
its daily backup logs.) Those files are all of 2.4 megabyte in size.
By adding some debug code to notmuch-new.c, I find out that the
program becomes very slow and consumes a lot of memory when adding
these files. I just killed it when it consumed 2 GByte again.

So as you suspected, the problem seems to stem from large files. But
those large files are not indices or stuff like that from different
mail programs, but they are valid emails which contain a lot of
(encoded) binary data.

Perhaps we should be able to configure notmuch such that he ignores
all mails that match specific pattern (like "Subject: Backup logs
from.*")

Regards
Dominik

Thread: