Re: [PATCH] nmbug: Allow Unicode tags and IDs in Python 2

Subject: Re: [PATCH] nmbug: Allow Unicode tags and IDs in Python 2

Date: Tue, 16 Feb 2016 09:04:07 -0400

To: W. Trevor King,


From: David Bremner

"W. Trevor King" <> writes:

> Avoid a UnicodeWarning and broken pipe on 'nmbug commit' in Python 2
> when a tag or message ID contains non-ASCII characters [1].
> There are a number of Python bugs associated with this behavior
> [2,3,4,5,6].  There's also some useful background in [8].  [3] lead to
> the currently working Python 3 implementation, which encodes to UTF-8
> by default and has 'encoding' and 'errors' arguments [7].  This commit
> follows that approach in a way that's compatible with both Python 2
> and Python 3.  Coercing to UTF-8 (regardless of locale) gives us
> consistent tag IDs for sharing between users.

I'm not sure what "tag IDs" are. Do you mean message-ids here? or "tags
and IDs"?

At first I thought there might be problems with non-utf8 message-ids,
but that turns out not to be the case [1].  It seems like it would take
a fairly heroic effort to get non-UTF8 tags into the database (perhaps
by calling the library interface with bad strings?) so we can probably
ignore this case. It might be good to document the limitation though,
since AFAIK, dump and restore can roundtrip any old crap.

> The 'isnumeric' check identifies Unicode instances in both Python 2
> [9] and Python 3 [10].

I still haven't really tried to understand this part, but probably it
deserves inline documentation.

> ---
> I haven't checked the other commands for issues with Unicode IDs or
> tags.  It's possible that in addition to this explicit encoding to
> UTF-8, we'll also want explicit decoding from UTF-8 when reading from
> Git trees (for 'nmbug checkout' and 'nmbug status').

Yes, this seems to be a problem, with the patch applied I can commit,
but the same utf-8 message-id causes problems.

bremner@zancas:~/software/upstream/notmuch$ ./devel/nmbug/nmbug status
U	D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@ÃÂÃ¥ðãÃ¥é-ÃÂÃÂ	unread
A	D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@Ãåðãåé-ÃÃ	unread

bremner@zancas:~/software/upstream/notmuch$ delve -a -1 ~/Maildir/.notmuch/xapian | grep D1B4DEBCAFFC4A05A4D4349A6EC5C9D8

[1]: id:87si0svnim.fsf@zancas.localnet