Re: [PATCH 00/17] nmbug-status: Python-3-compabitility and general refactoring

Subject: Re: [PATCH 00/17] nmbug-status: Python-3-compabitility and general refactoring

Date: Tue, 04 Feb 2014 20:40:18 +0200

To: W. Trevor King

Cc: notmuch@notmuchmail.org

From: Tomi Ollila


On Tue, Feb 04 2014, "W. Trevor King" <wking@tremily.us> wrote:

>
>   >>> from __future__ import unicode_literals
>   >>> import codecs
>   >>> import locale
>   >>> import sys
>   >>> print(locale.getpreferredencoding())  # same as yours
>   UTF-8
>   >>> print(sys.getdefaultencoding())  # same as yours
>   ascii
>   >>> _ENCODING = locale.getpreferredencoding() or sys.getdefaultencoding()
>   >>> print(_ENCODING)  # double-check default encodings
>   UTF-8
>   >>> byte_stream = sys.stdout  # copied from Page.write
>   >>> stream = codecs.getwriter(encoding=_ENCODING)(stream=byte_stream)
>   >>> data = {'from': '\u017b'}  # fake the troublesome data
>   >>> print(type(data['from']))  # double-check unicode_literals
>   <type 'unicode'>
>   >>> string = '  <td>{from}</td>\n'.format(**data)
>   >>> stream.write(string)
>     <td>Ż</td>
>
> It looks like you'll have the same _ENCODING as I do (UTF-8).  That
> means your stream should be wrapped in a UTF-8 StreamWriter, so I
> don't understand why it's converting to ASCII.  Can you run through
> the above on your troublesome machine and confirm that stream.write()
> is still raising the exception?  If it doesn't work, can you just
> paste that whole run in your next email?

I don't know what to paste, so i paste this:

$ python
Python 2.6.6 (r266:84292, Nov 21 2013, 12:39:37) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> data = {'from': '\u017b'}
>>> print(type(data['from'])) 
<type 'str'>
>>> string = '  <td>{from}</td>\n'.format(**data)
>>> print string
  <td>\u017b</td>

and then:

>>> data = {'from': u'\u017b'}
>>> print(type(data['from'])) 
<type 'unicode'>
>>> string = '  <td>{from}</td>\n'.format(**data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u017b' in
>>> position 0: ordinal not in range(128)

... and ...

>>> import os
>>> print os.environ['LANG']
en_US.UTF-8


> Thanks,
> Trevor


Tomi

Thread: