Re: excessive thread fusing

Subject: Re: excessive thread fusing

Date: Sun, 20 Apr 2014 13:46:01 -0400

To: notmuch@notmuchmail.org

Cc:

From: Austin Clements


Quoth myself on Apr 20 at 12:48 pm:
> Quoth Andrei POPESCU on Apr 20 at 12:04 am:
> > On Sb, 19 apr 14, 18:52:02, Eric wrote:
> > > 
> > > This may not actually be any help, but both hypermail and mhonarc agree
> > > that two messages form a separate thread from the rest. I believe that
> > > the latter, at least, is the JWZ algorithm.
> > 
> > mutt concurs.
> 
> Can anyone explain why JWZ *doesn't* have the same problem?  I don't
> see how this heuristic doesn't doom it to the same fate:
> 
>   The References field is populated from the ``References'' and/or
>   ``In-Reply-To'' headers. If both headers exist, take the first thing
>   in the In-Reply-To header that looks like a Message-ID, and append
>   it to the References header.
> 
> Given this, even considering only messages 18 and 52 (which "should"
> be in different threads), JWZ should find the common "parent"
> e.fraga@ucl.ac.uk and link them in to the same thread:
> 
> Add 18 (step 1)
> - The combined "references" list is <ID17> <e.fraga@ucl.ac.uk>
> - Creates and links containers 17 <- e.fraga@ucl.ac.uk <- 18 where the
>   first two are empty
> 
> Add 52 (step 1)
> - The combined "references" list is <ID31> <ID32> <ID39>
>   <e.fraga@ucl.ac.uk>
> - Creates and links containers 31 <- 32 <- 39
> - Also considers container e.fraga@ucl.ac.uk, but this is already
>   linked, so it doesn't change it
> - Creates container 52 and links e.fraga@ucl.ac.uk <- 52 (step 1C)
> 
> 18 and 52 will later get promoted over their empty parent (step 4),
> but will remain in the same thread.
> 
> What am I missing?  Or are these other MUAs not using pure JWZ?

I dug in to mutt's mutt_sort_threads a bit.  It's not using JWZ,
though it's something similar.  The most salient thing may be how it
handles in-reply-to and references:

1. If a message has both in-reply-to and references, the parent chain
   is the *last* in-reply-to ID and then the references from right to
   left (skipping the last reference ID if it's the same as the last
   in-reply-to ID).  (See also mutt_parse_references.)
2. If a message has only in-reply-to, the parent chain is *all* of the
   IDs in in-reply-to *from right to left* (e.g., the right-most one
   is the immediate parent).
3. If a message has only references, the parent chain is that, from
   right to left.

Like JWZ, mutt creates and links together "empty containers" as it
scans the parent chain towards the root, though unlike JWZ it stops
when it finds a non-empty container or a container that already has a
parent.

Thread: