David Edmondson <dme@dme.org> writes: > On Sun, Mar 13 2016, Mark Walters wrote: >> However, it would be sensible to get testing in a greater variety of >> charsets/encodings > > Agreed. Does anyone have suggestions on how we might achieve this? A > corpus of mail that we could use? Maybe the notmuch performance corpus, particularly the lkml sample. grep -R charset= performance-test/corpus/mail/lkml | sed -e 's/^.*charset=//' -e 's/;.*//' -e 's/"//g' | tr '[A-Z]' '[a-z]' | sort -u gives euc-kr gb2312 iso-2022-jp iso-2022-jp-2 iso-8859-1 iso-8859-14 iso 8859-15 iso-8859-15 iso-8859-1 iso-8859-2 iso-8859-6 iso-8859-7 iso-8859-9 koi8-r koi8-u ks_c_5601-1987 shift_jis unknown unknown-8bit us-ascii utf8 utf-8 windows-1250 windows-1251 windows-1252 windows-1255 to unpack the corpus cd performance-test make download-corpus ./T00-new.sh --large probably interrupt the test once notmuch-new starts running.