> just remove it), but along the way of searching and viewing mail, I've > encountered quite a few occurrences of failing to UnicodeEncode. An example > backtrace looks like this: > > Traceback (most recent call last): > File "/usr/lib/python2.7/dist-packages/web/application.py", line 239, in > process > return self.handle() > File "/usr/lib/python2.7/dist-packages/web/application.py", line 230, in > handle > return self._delegate(fn, self.fvars, args) > File "/usr/lib/python2.7/dist-packages/web/application.py", line 420, in > _delegate > return handle_class(cls) > File "/usr/lib/python2.7/dist-packages/web/application.py", line 396, in > handle_class > return tocall(*args) > File "/b/git/notmuch-brians.git/contrib/notmuch-web/nmweb.py", line 153, > in GET > sprefix=webprefix) > File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 989, > in render > return self.environment.handle_exception(exc_info, True) > File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 754, > in handle_exception > reraise(exc_type, exc_value, tb) > File "templates/show.html", line 1, in top-level template code > {% extends "base.html" %} > File "templates/base.html", line 32, in top-level template code > {% block content %} > File "templates/show.html", line 12, in block "content" > {% for part in format_message(m.get_filename(),mid): %}{{ part|safe > }}{% endfor %} > File "/b/git/notmuch-brians.git/contrib/notmuch-web/nmweb.py", line 245, > in format_message_walk > tags=safe_tags).encode(part.get_content_charset('ascii'))) > UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in > position 1141: ordinal not in range(256) > > 127.0.0.1:60968 - - [31/Oct/2017 17:00:02] "HTTP/1.1 GET /show/ > 665d8c5c2b024898ae21951c4b8b4f93@CO2PR05MB747.namprd05.prod.outlook.com" - > 500 Internal Server Error > > I'm no Python expert, but from a quick google it would seem like the cause > of such an exception is related to not using utf-8. Neat. So to get there, this has to be a text/html part. It has to have been decoded, either with the declared content type or with ascii. If a \u201c (left double quote) showed up, it didn't get decoded as ascii---and indeed, it looks like the content-type specifies latin-1. But now when we try to encode back, using the same latin-1, it fails? That's really neat. > Brian - do you think something needs modifying in nmweb.py to cater for > this type of thing, or is this somehow related my own mailstore (not sure > why that would be as my messages haven't been modified). Lots of mail has busted encoding. I've done some defensive work against that---look at decodeAnyway and shed a tear for purity---but clearly not enough. Can you send me a message that causes the problem? In the mean time, I think like 245 ought to be, appropriately indented: tags=safe_tags).encode(part.get_content_charset('ascii'), 'xmlcharrefreplace')) Thanks for the report---investigating it showed me that the search box doesn't tolerate that character either. -Brian _______________________________________________ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch