On Sat, 18 Oct 2014, Sergei Shilovsky <sshilovsky@gmail.com> wrote: >> Hi, Sergei. I'm not clear on where exactly you are seeing a problem >> with this tab in the subject line. Is it showing up somewhere you think >> it shouldn't? > > It is shown in e.g. `notmuch show` as well as > 'notmuch_message_get_header(m, "subject")` > >> I'm not sure libnotmuch should be doing any scrubbing of the message >> contents. The emacs UI does seem to replace the tab with a space, >> though. Maybe other MUAs should be doing the same? > > My point is that this tabulation character does not relate to the > contents of the header (this might be arguable though) and libnotmuch > should return the contents, not its representation on file system. This is folding and unfolding of long header fields in action, described in [1]. In short, folding happens by inserting CRLF before any WSP, and unfolding happens by removing any CRLF immediately followed by WSP. The WSP is preserved unchanged through folding and unfolding. The TAB is not part of the multiple line representation, it's part of the unfolded content. If my memory serves me right, many problems lead back to an interpretation of [2] that you could insert extra WSP while folding. Due to this interpretation, many agents replace the WSP following a CRLF with a single space while unfolding. And presumably because of this, buggy folding in a Python email package that replaces WSP by a TAB while folding went unnoticed. This problem, in turn, has been literally spread wide by Mailman 2 through its use of said email package. In practice it follows that a perfectly good message will have folding WSP replaced by TAB when it gets transmitted through Mailman 2. Again, this is all from memory, [citation needed] etc. Notmuch is not free of a history of its own when it comes to header unfolding. For historical reasons, we used two header parsers until recently. One from gmime, and one of our own. After all of the above, it shouldn't surprise the reader that the parsers treated folding WSP differently! Our own parser replaced folding WSP with a single space, while gmime respects the RFC. Starting from 0.18 we only use gmime to parse headers, which means we're at least consistent, but, by the GIGO principle, we may see more folding TABs. I do not think we should workaround header folding problems in the lib, and I'm not sure about the cli either. We should consider replacing TABs with spaces in notmuch-emacs though (I personally use a notmuch-show-markup-headers-hook that does that). HTH, Jani. [1] https://tools.ietf.org/html/rfc5322#section-2.2.3 [2] https://tools.ietf.org/html/rfc822#section-3.1