I spent a little time this morning staring at the code, and it seems that all of the message-ids are parsed via g_mime_decode_text, which deals with RFC2047 encodings and makes guesses at decoding 8bit characters. In practice this means that in the notmuch database all headers are UTF-8. Since message-id's are supposed to be printable ascii [at least in rfc5322], this seems like not such a terrible decision, but I wonder if we should document this potential conversion somewhere? d