python-notmuch decoding error on a message

Subject: python-notmuch decoding error on a message

Date: Sun, 06 Nov 2011 22:16:03 -0000

To: notmuch@notmuchmail.org

Cc:

From: Antoine Amarilli


Hello,

The attached message makes python-notmuch crash when trying to access it (see
attached log).

I don't know if the encoding of Subject is valid or not, but it would probably
be better anyway to ignore decoding errors and return some approximation of
Subject instead of failing like this.

Any ideas?

Thanks!

-- 
Antoine Amarilli

$ python
Python 2.7.2+ (default, Aug 16 2011, 09:23:59) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import notmuch
>>> db = notmuch.Database()
>>> q = db.create_query("id:test20110928121705.GA3877@example.com")
>>> t = q.search_threads()
>>> for a in t:
...     print a
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python2.7/dist-packages/notmuch/thread.py", line 379, in __str__
    thread['subject'] = self.get_subject()
  File "/usr/local/lib/python2.7/dist-packages/notmuch/thread.py", line 311, in get_subject
    return subject.decode('UTF-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe8 in position 6: invalid continuation byte
>>> 
Date: Wed, 28 Sep 2011 14:17:05 +0200
From: nobody@example.com
To: nobody@example.com
Subject: Re: Fwd: =?utf-8?B?M+ht?= =?utf-8?Q?e?= Salon du Livre juridique
Message-ID: <test20110928121705.GA3877@example.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="ZGiS0Q5IWpPtfppv"
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
Content-Length: 865
Lines: 2

test

signature.asc (application/pgp-signature)

Thread: