Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated

Subject: Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated

Date: Sun, 03 Mar 2013 18:46:18 -0500

To: Jani Nikula, notmuch@notmuchmail.org

Cc:

From: Aaron Ecay


Hi Jani,

Thanks to you and Austin for the comments.

2013ko martxoak 1an, Jani Nikula-ek idatzi zuen:
>> I think the background is that RFC 822 defines In-Reply-To (and
>> References too for that matter) as *(phrase / msg-id), while RFC 2822
>> defines them as 1*msg-id. I'd like something about RFC 822 being
>> mentioned in the commit message.
>> 
>> The problem in the gmane message you link to in
>> id:87liaa3luc.fsf@gmail.com is likely related to the FAQ item 05.26
>> "How do I fix a bogus In-Reply-To or missing References field?" in
>> the MH FAQ http://www.newt.com/faq/mh.html.

Likely yes.  But I think notmuch should handle these messages, since
they are seen in the wild (and I don’t think you disagree with me on
this point?)


>> 
>> As the comment for the function says, we explicitly avoid including
>> self-references. I think I'd err on the safe side and return NULL if
>> the last ref equals message-id.

Done.

>> 
>> I don't know how you got this non-change hunk here, but please remove
>> it. :)

That’s what I get for setting my editor to delete trailing whitespace on
save (then not reading outgoing patches carefully).  Fixed.

>> I wonder if you should reuse your parse_references() change here, so
>> you'd set in_reply_to_message_id to the last message-id in
>> In-Reply-To. This might tackle some of the problematic cases
>> directly, but should still be all right per RFC 2822. I didn't verify
>> how the parser handles an RFC 2822 violating free form header though.
> 
> Strike that based on http://www.jwz.org/doc/threading.html:
> 
> "If there are multiple things in In-Reply-To that look like
> Message-IDs, only use the first one of them: odds are that the later
> ones are actually email addresses, not IDs."

Hmm.  I think it’s a toss-up which of multiple quasi-message-ids is the
real one.  In the email message example I linked upthread, it was the
last one that was real.  I decided to use the last one, because it
allows the self-reference checking to be pushed entirely into
parse_references.  If you feel strongly that we should use the first
one, I can change it back.

> I talked to Austin (CC) about the patch on IRC, and his comment was,
> perceptive as always:
> 
>  23:38 amdragon Is the logic in that patch equivalent to always using
> the last message ID in references unless there is no references
> header?  Seems like it is, but in a convoluted way.
> 
> And that's actually the case, isn't it? To make the code reflect that,
> you should use last_ref_message_id, and if that's NULL, fallback to
> in_reply_to_message_id.

Yes.  Fixed.

> 
>> I suggest adding an else if branch (or revamp the above if condition)
>> to tackle the missing In-Reply-To header:
>> 
>> else if (!in_reply_to_message_id && last_ref_message_id) {
>> in_reply_to_message_id = last_ref_message_id; }
> 
> Strike that, it should be the other way round.

Now that the self-reference check is in parse_references, the
conditional is much simpler.

One additional change I made in this version was to factor out 3 calls
to “notmuch_message_get_message_id (message)” into a variable inside the
_notmuch_database_link_message_to_parents function, for a small boost to
readability (and perhaps speed, depending on how clever the compiler is
I guess).

I also added tests – those are the first of two patches that will follow
this email, the second being the code to make them pass.

-- 
Aaron Ecay

Thread: