Quoting Austin Clements (2013-06-23 18:59:39) > Quoth Justus Winter on Jun 23 at 3:11 pm: > > Hi, > > > > I recently had a problem replying to a mail written by Thomas Schwinge > > using an oldish notmuch. Not sure if it has been fixed in more recent > > versions, but I think notmuch could improve uppon its header > > generation (see below). Problematic part of the mail: > > > > ~~~ snip ~~~ > > [...] > > To: someone@example.org, "line > > break" <linebreak@example.org>, someoneelse@example.org > > User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu) > > [...] > > ~~~ snap ~~~ > > > > http://tools.ietf.org/html/rfc2822#section-2.2.3 says: > > > > Note: Though structured field bodies are defined in such a way that > > folding can take place between many of the lexical tokens (and even > > within some of the lexical tokens), folding SHOULD be limited to > > placing the CRLF at higher-level syntactic breaks. For instance, if > > a field body is defined as comma-separated values, it is recommended > > that folding occur after the comma separating the structured items in > > preference to other places where the field could be folded, even if > > it is allowed elsewhere. > > > > So notmuch "rfc-SHOULD" place the newlines after the comma. > > > > The rfc goes on: > > > > The process of moving from this folded multiple-line representation > > of a header field to its single line representation is called > > "unfolding". Unfolding is accomplished by simply removing any CRLF > > that is immediately followed by WSP. Each header field should be > > treated in its unfolded form for further syntactic and semantic > > evaluation. > > > > My interpretation is that unfolding simply removes any linebreaks > > first, so the value does not contain any newlines. But pythons email > > module discriminates quoted and unquoted parts of the value: > > > > ~~~ snip ~~~ > > from __future__ import print_function > > import email > > from email.utils import getaddresses > > > > m = email.message_from_string('''To: "line > > break" <linebreak@example.org>, line > > break <linebreak@example.org>''') > > print("m['To'] = ", m['To']) > > print("getaddresses(m.get_all('To')) = ", getaddresses(m.get_all('To'))) > > ~~~ snap ~~~ > > > > % python3 test.py > > m['To'] = "line > > break" <linebreak@example.org>, line > > break <linebreak@example.org> > > getaddresses(m.get_all('To')) = [('line\n break', 'linebreak@example.org'), ('line break', 'linebreak@example.org')] > > > > I believe that is what's preventing me from replying to the message > > using alot without sanitizing the To header first. Not really sure who > > is wrong or right here... any thoughts? > > There are at least two bugs here. Regardless of what we RFC-should > do, that folding *is* permitted by RFC2822, since quoted > strings can contain folding whitespace: > > http://tools.ietf.org/html/rfc2822#section-3.2.5 > > For completeness, the full derivation for this "To" header is: > > to = "To:" address-list CRLF > address-list = (address *("," address)) / obs-addr-list > address = mailbox / group > mailbox = name-addr / addr-spec > name-addr = [display-name] angle-addr > display-name = phrase > phrase = 1*word / obs-phrase > word = atom / quoted-string > quoted-string = [CFWS] > DQUOTE *([FWS] qcontent) [FWS] DQUOTE > [CFWS] > > Do you happen to know how the strangely folded "to" header was > produced for this message? No, but Thomas might. Thomas, the problematic message is id:877ghpqckb.fsf@kepler.schwinge.homeip.net > In notmuch-emacs, a user can put whatever > they want in a message-mode buffer's headers and mm will dutifully > pass it on to their MTA. We could validate it, but that's a slippery > slope and I would hope that the MTA itself is validating it (and > probably more thoroughly than we could). > > That said, the first bug here is in Python. As I mentioned above, > foldable whitespace is allowed in quoted strings. In fact, though the > standard is rather long-winded about whitespace, if you dig into the > grammar, you'll find that *all whitespace can be folded* (except in > the obsolete grammar, which allowed whitespace between the header name > and the colon, which obviously can't be folded). I'm not sure what > Python is doing, but I bet it's going to a lot of effort to > mis-implement something very simple. Yes, I'm glad you came to the same conclusion. > There also appears to be a bug in the notmuch CLI's reply command > where it omits addresses that were folded in the original message. I > don't know if alot uses the CLI's reply command, so this may or may > not be related to your specific issue. I haven't dug into this yet, > other than to confirm that it's the CLI's fault and not > notmuch-emacs's. No, alot does not use notmuchs reply command. Thanks, Justus