Re: Handling mislabeled emails encoded with Windows-1252

Subject: Re: Handling mislabeled emails encoded with Windows-1252

Date: Tue, 24 Jul 2018 15:55:54 +0200

To: David Bremner, notmuch@notmuchmail.org

Cc:

From: Sebastian Poeplau


Hi again,

>> Everyone's mail situation is unique, but I haven't noticed this
>> problem. Do you have a mechanical (e.g. scripted) way of detecting such
>> mails? I suppose it could just look for characters in the range 0x80 to
>> 0x95 in allegedly ISO_8859-1 messages. A census of the situation in my
>> own mail would help me think about this problem, I think.
>
> Yes, I guess that should be a good enough heuristic for detecting
> affected mail. I'll try to come up with a simple script and post it
> here.

Attached is a Python script that checks individual message files and
prints their name if it finds them to contain mislabeled Windows-1252
text. The heuristic seems to work well on my mail - let me know if you
encounter any issues!

Cheers,
Sebastian


find_mislabeled_cp1252.py (application/octet-stream)
_______________________________________________
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch

Thread: