Hi Jani, Thanks to you and Austin for the comments. 2013ko martxoak 1an, Jani Nikula-ek idatzi zuen: >> I think the background is that RFC 822 defines In-Reply-To (and >> References too for that matter) as *(phrase / msg-id), while RFC 2822 >> defines them as 1*msg-id. I'd like something about RFC 822 being >> mentioned in the commit message. >> >> The problem in the gmane message you link to in >> id:87liaa3luc.fsf@gmail.com is likely related to the FAQ item 05.26 >> "How do I fix a bogus In-Reply-To or missing References field?" in >> the MH FAQ http://www.newt.com/faq/mh.html. Likely yes. But I think notmuch should handle these messages, since they are seen in the wild (and I don’t think you disagree with me on this point?) >> >> As the comment for the function says, we explicitly avoid including >> self-references. I think I'd err on the safe side and return NULL if >> the last ref equals message-id. Done. >> >> I don't know how you got this non-change hunk here, but please remove >> it. :) That’s what I get for setting my editor to delete trailing whitespace on save (then not reading outgoing patches carefully). Fixed. >> I wonder if you should reuse your parse_references() change here, so >> you'd set in_reply_to_message_id to the last message-id in >> In-Reply-To. This might tackle some of the problematic cases >> directly, but should still be all right per RFC 2822. I didn't verify >> how the parser handles an RFC 2822 violating free form header though. > > Strike that based on http://www.jwz.org/doc/threading.html: > > "If there are multiple things in In-Reply-To that look like > Message-IDs, only use the first one of them: odds are that the later > ones are actually email addresses, not IDs." Hmm. I think it’s a toss-up which of multiple quasi-message-ids is the real one. In the email message example I linked upthread, it was the last one that was real. I decided to use the last one, because it allows the self-reference checking to be pushed entirely into parse_references. If you feel strongly that we should use the first one, I can change it back. > I talked to Austin (CC) about the patch on IRC, and his comment was, > perceptive as always: > > 23:38 amdragon Is the logic in that patch equivalent to always using > the last message ID in references unless there is no references > header? Seems like it is, but in a convoluted way. > > And that's actually the case, isn't it? To make the code reflect that, > you should use last_ref_message_id, and if that's NULL, fallback to > in_reply_to_message_id. Yes. Fixed. > >> I suggest adding an else if branch (or revamp the above if condition) >> to tackle the missing In-Reply-To header: >> >> else if (!in_reply_to_message_id && last_ref_message_id) { >> in_reply_to_message_id = last_ref_message_id; } > > Strike that, it should be the other way round. Now that the self-reference check is in parse_references, the conditional is much simpler. One additional change I made in this version was to factor out 3 calls to “notmuch_message_get_message_id (message)” into a variable inside the _notmuch_database_link_message_to_parents function, for a small boost to readability (and perhaps speed, depending on how clever the compiler is I guess). I also added tests – those are the first of two patches that will follow this email, the second being the code to make them pass. -- Aaron Ecay