Carl Worth <cworth@cworth.org> writes: > > Another idea would be to trigger specifically on common forms. Judging > From the samples in this particular thread, it seems like a workable > heuristic would be: > > If the In-Reply-To header begins with '<': > > Parse that initial portion as a message ID > > Else if it ends with '>': > > Parse that final portion as a message ID > > Else > > Ignore this garbage-valued header. > using the hacky script below, I scanned my own mail collection of about 300k messages. I can make the following observations - I have some RFC compliant in-reply-to's with multiple ids - I have have a non-trivial number of Message from $NAME <address> of $date <id> - I didn't see any cases where using the last angle bracketed thing would fail. - I did see some some cases where the header starts with '<' but the matching '>' was missing - I also noticed some rfc2047 encoding of in-reply-to headers. ###################################################################### # hacky script follows dir=$1 echo Scanning $dir tempdir=$(mktemp -d) echo Writing to ${tempdir} find $dir -exec sh -c "formail -c -xIn-reply-to < {}" \; \ > ${tempdir}/ids sed -e 's/\t/ /' -e 's/ */ /g' -e 's/<[^ ]*>/<id>/g' -e 's/(.*)/(comment)/' < ${tempdir}/ids | sort | uniq | tee ${tempdir}/report