Re: web interface to notmuch

Subject: Re: web interface to notmuch

Date: Tue, 31 Oct 2017 15:21:40 -0400

To: Matthew Lear

Cc:, Vladimir Panteleev, Daniel Kahn Gillmor

From: Brian Sniffen

> just remove it), but along the way of searching and viewing mail, I've
> encountered quite a few occurrences of failing to UnicodeEncode. An example
> backtrace looks like this:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/dist-packages/web/", line 239, in
> process
>     return self.handle()
>   File "/usr/lib/python2.7/dist-packages/web/", line 230, in
> handle
>     return self._delegate(fn, self.fvars, args)
>   File "/usr/lib/python2.7/dist-packages/web/", line 420, in
> _delegate
>     return handle_class(cls)
>   File "/usr/lib/python2.7/dist-packages/web/", line 396, in
> handle_class
>     return tocall(*args)
>   File "/b/git/notmuch-brians.git/contrib/notmuch-web/", line 153,
> in GET
>     sprefix=webprefix)
>   File "/usr/lib/python2.7/dist-packages/jinja2/", line 989,
> in render
>     return self.environment.handle_exception(exc_info, True)
>   File "/usr/lib/python2.7/dist-packages/jinja2/", line 754,
> in handle_exception
>     reraise(exc_type, exc_value, tb)
>   File "templates/show.html", line 1, in top-level template code
>     {% extends "base.html" %}
>   File "templates/base.html", line 32, in top-level template code
>     {% block content %}
>   File "templates/show.html", line 12, in block "content"
>     {% for part in format_message(m.get_filename(),mid): %}{{ part|safe
> }}{% endfor %}
>   File "/b/git/notmuch-brians.git/contrib/notmuch-web/", line 245,
> in format_message_walk
>     tags=safe_tags).encode(part.get_content_charset('ascii')))
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
> position 1141: ordinal not in range(256)
> - - [31/Oct/2017 17:00:02] "HTTP/1.1 GET /show/
>" -
> 500 Internal Server Error
> I'm no Python expert, but from a quick google it would seem like the cause
> of such an exception is related to not using utf-8.

Neat.  So to get there, this has to be a text/html part.  It has to have
been decoded, either with the declared content type or with ascii.  If a
\u201c (left double quote) showed up, it didn't get decoded as
ascii---and indeed, it looks like the content-type specifies latin-1.
But now when we try to encode back, using the same latin-1, it fails?
That's really neat.

> Brian - do you think something needs modifying in to cater for
> this type of thing, or is this somehow related my own mailstore (not sure
> why that would be as my messages haven't been modified).

Lots of mail has busted encoding.  I've done some defensive work against
that---look at decodeAnyway and shed a tear for purity---but clearly not
enough.  Can you send me a message that causes the problem?

In the mean time, I think like 245 ought to be, appropriately indented:


Thanks for the report---investigating it showed me that the search box
doesn't tolerate that character either.

notmuch mailing list