Subject: Encodings

Date: Mon, 11 Jul 2011 16:04:17 +0200

To: Notmuch developer list


From: Sebastian Spaeth

Hi all,
after I was notified about how notmuch's python bindings perform
differently depending on whether we hand it (byte-based) ASCII strings
or unicode, I tried to disentangle what encodings to expect and send it
to. The answer is that things are very implicit. notmuch.h speaks of
strings but never mentions encodings, xapian docs don't mention
encodings but ojwb confirmed that it expects utf-8.

So, can be document what encoding we are expected to pass in the various
APIs and where we can guarantee to actually return UTF-8 encoded
strings? For some of the stuff we read directly from the files, eg
arbitrary headers, we can probably be least sure, but are e.g. the
returned tags always utf-8?

I would love to make the python bindings use unicode() instances in
cases where we can be sure to actually receive utf-8 encoded strings.

Encodings make my brain hurt. Unfortunately one cannot simply ignore

part-000.sig (application/pgp-signature)