Avoid a UnicodeWarning and broken pipe on 'nmbug commit' in Python 2 when a tag or message ID contains non-ASCII characters [1]. There are a number of Python bugs associated with this behavior [2,3,4,5,6]. There's also some useful background in [8]. [3] lead to the currently working Python 3 implementation, which encodes to UTF-8 by default and has 'encoding' and 'errors' arguments [7]. This commit follows that approach in a way that's compatible with both Python 2 and Python 3. Coercing to UTF-8 (regardless of locale) gives us consistent tag IDs for sharing between users. The 'isnumeric' check identifies Unicode instances in both Python 2 [9] and Python 3 [10]. [1]: id:87twlbv5vj.fsf@zancas.localnet http://thread.gmane.org/gmane.mail.notmuch.general/21855/focus=21862 Subject: Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe) Date: Sun, 14 Feb 2016 08:22:24 -0400 [2]: http://bugs.python.org/issue2637 [3]: http://bugs.python.org/issue3300 [4]: http://bugs.python.org/issue22231 [5]: http://bugs.python.org/issue23885 [6]: http://bugs.python.org/issue1712522 [7]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote [8]: https://mail.python.org/pipermail/python-dev/2006-July/067335.html [9]: https://docs.python.org/2/library/stdtypes.html#unicode.isnumeric [10]: https://docs.python.org/3/library/stdtypes.html#str.isnumeric --- I haven't checked the other commands for issues with Unicode IDs or tags. It's possible that in addition to this explicit encoding to UTF-8, we'll also want explicit decoding from UTF-8 when reading from Git trees (for 'nmbug checkout' and 'nmbug status'). Cheers, Trevor devel/nmbug/nmbug | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/devel/nmbug/nmbug b/devel/nmbug/nmbug index 81f582c..284d374 100755 --- a/devel/nmbug/nmbug +++ b/devel/nmbug/nmbug @@ -1,6 +1,6 @@ #!/usr/bin/env python # -# Copyright (c) 2011-2014 David Bremner <david@tethera.net> +# Copyright (c) 2011-2016 David Bremner <david@tethera.net> # W. Trevor King <wking@tremily.us> # # This program is free software: you can redistribute it and/or modify @@ -95,7 +95,7 @@ except AttributeError: # Python < 3.2 _tempfile.TemporaryDirectory = _TemporaryDirectory -def _hex_quote(string, safe='+@=:,'): +def _hex_quote(string, safe='+@=:,', encoding='utf-8', errors='strict'): """ quote('abc def') -> 'abc%20def'. @@ -103,6 +103,15 @@ def _hex_quote(string, safe='+@=:,'): addition to letters, digits, and '_.-') and lowercase hex digits (e.g. '%3a' instead of '%3A'). """ + if hasattr(string, 'isnumeric'): + string = string.encode(encoding, errors) + if hasattr(safe, 'isnumeric'): + safe_bytes = safe.encode(encoding, errors) + if len(safe_bytes) != len(safe): + raise ValueError( + 'some safe characters are encoded as multiple bytes ' + '({!r} -> {!r})'.format(safe, safe_bytes)) + safe = safe_bytes uppercase_escapes = _quote(string, safe) return _HEX_ESCAPE_REGEX.sub( lambda match: match.group(0).lower(), -- 2.1.0.60.g85f0837