Hi David, thanks for replying. On 2019-Jan-19, David Bremner wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > > In my read of the code ultimately comes from > > g_mime_parser_construct_message rejecting the message. > > I reported this to GMime, and they said that the problem is that notmuch > > insert is using the mbox mode: > > https://github.com/jstedfast/gmime/issues/58 > > (Sample email is attached there). > > This issue (or a related one) has come up before > > https://nmbug.notmuchmail.org/nmweb/search/postfix+mbox > > Generally it seems to be caused by tools that add mbox 'From ' headers, > without actually mbox escaping the file. We haven't yet reached > consensus on a good solution (generally people just want to fix their > own mail, which is understandable). A workaround discussed in the > messages I reference above is to strip the 'From ' header before passing > to notmuch-insert. Perhaps some scholar of the RFCs can convince us that > that is "always" the right thing for notmuch insert to do. I'm not sure I follow. As I understand, notmuch does not work with mboxes, only with maildirs, so the behavior of splitting emails at "From " is not strictly necessary, since one file always equals one message. As for RFC scholarship, I spent some time looking at https://tools.ietf.org/html/rfc5322 to see if it defined any sort of message separator ... but as far as I can tell, it only defines what does a valid message looks like. It doesn't say where does one message end. On the other hand, in my world, it's been quite a while since 'From ' was considered a useful message separator. This stopped being true in a pretty extensive way when git-format-patches messages started being posted as attachments. But even before that, MUAs stopped adding the ">" at the start of a "From " line in human-written text. Nowadays what really governs the split is the Content-Length header, from the MIME definitions. Most tools do not escape lines starting with 'From ' anymore. As far as I can tell, this is defined by RFC-2049, https://tools.ietf.org/html/rfc2046#section-5.1.1 which states that the implementation must look for the "boundary delimitir line". Stopping at a "From " line before finding the boundary delimiter line would be a mistake, in my reading. > > As far as I can tell, this is all coming from > > _notmuch_message_file_parse() which sets the is_mbox flag when it sees > > the "^From " line at the start of the file ... which kinda makes sense > > in general terms, but for notmuch-insert I think that's the wrong thing > > to do. Maybe a solution is to pass a flag down from notmuch-insert.c's > > add_file all the way down to _notmuch_message_file_parse telling it not > > to treat the file as an mbox. > > I'd be worried about letting notmuch-insert deliver messages that > notmuch-new would not be able to parse. In particular we'd like to keep > the property that a Maildir + the output of notmuch-dump should be > enough to completely recover the notmuch database. Hmm, that's a good point -- I assume that notmuch-new should be patched similarly so that those messages are valid there too. So maybe the solution (given that, as I said above, Notmuch does not appear to handle mboxes at all) is to just set the mbox flag to false completely ... -- Álvaro Herrera PostgreSQL Expert, https://www.2ndQuadrant.com/ _______________________________________________ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch