Re: [PATCH] Don't bother checking for mbox files

Subject: Re: [PATCH] Don't bother checking for mbox files

Date: Mon, 14 Mar 2016 09:23:21 +0200

To: Jani Nikula, Edward Betts, notmuch@notmuchmail.org

Cc:

From: Tomi Ollila


On Sun, Mar 13 2016, Jani Nikula <jani@nikula.org> wrote:

> [ text/plain ]
> On Sun, 13 Mar 2016, Edward Betts <edward@4angle.com> wrote:
>> Keith Packard <keithp@keithp.com> wrote:
>>> Postfix adds mbox-style From lines when used in combination with
>>> maildrop or .forward files. If they have another line starting with
>>> 'From ' in them, notmuch complains about them not being mail files.
>>> 
>>> If we assume the user hasn't screwed up and misconfigured their mail
>>> system, then we can safely ignore whether the file started with an
>>> mbox header and just parse it as a single-message file.
>>
>> I think it is fine to go ahead with this change. At the same time the
>> behaviour of Postfix should be corrected so it doesn't add mbox-style From
>> lines to mails in maildir format.
>
> I disagree with making the change (as-is, at least).
>
> In general, Notmuch does not support mboxes. We expect maildir style one
> message per file mail storage. We support single-message mboxes as a
> special case, in part because, as you note, there's plenty of other
> software that adds the mbox "From " line even though delivering to
> maildir.
>
> I think it's misleading and confusing to the users to accept and index
> the first message of mboxes, and silently ignore the rest (or worse,
> index all of the mbox and associate the text with the first message). I
> think we should reject multi-message mboxes, because we have no code to
> handle them. This patch throws away that check.
>
> Now, IIUC, the problem here is not that the files actually are
> multi-message mboxes. We could use a sample message (even a crafted one)
> that exhibits the problem, so we could add a test case, and fix Notmuch
> to deal with it gracefully (if we decide catering to potentially broken
> other software is the way to go), while retaining the code to reject
> multi-message mboxes. With the test case, we'd also avoid accidentally
> breaking this in the future.

I agree with Jani; user may accidentally index one mbox with multiple
messages as single message if this were merged...

We currently have very simple check; just line starting with 'From ' to
separate messages (and first line starts with 'From '). After a quick check
of these 'mbox*' "specs" this may just be within the "standard".

In mboxviewfs I checked whether there is at least one empty line before
'^From' (might not be required by the standard, but whatever ;/) and that
there is at least 'Date:' header following (needed for file "time")... but
even this "heuristics" may not be enough if we wanted to go deep into
this (i.e. there are emails which quote beginning of an mbox file (ok, no
heuristics can match this unless there is human-level AI working on it ;)

OTOH, presumably

https://github.com/GNOME/gmime/blob/master/tests/data/mbox/input/substring.mbox

contains 3 messages (or what??!!11)

...

Perhaps the simplest is to give users possibility to use 'footgun' option
in notmuch new (notmuch insert probably doesn't need it ???) which can be
used to skip the 'mbox' check (I was going to suggest configuration option,
but as we don't support that in bindings, ...). But of course some of the
simplicity is gone when one forgets to give the --footgun option -- next
notmuch new with the footgun probably will not pick the mail file again
(or we have to hold on updating the directory mtime indefinitely -- or
do other changes (i.e. more complicated which no-one reviews(*) anyway >;/))


> BR,
> Jani.

Tomi

(*) Although when someone sends less than usual trivial patches which
provides significant progression to the functionality those are reviewed
promptly with a relatively good number of reviewers...

One 'other change' could be e.g. keep a list of files that has been failing
due to this and retry those if this footgun option is given.

Thread: