Quoting Michal Sojka (2012-02-24 11:00:02) >On Fri, 24 Feb 2012, Serge Z wrote: >> >> Quoting Michal Sojka (2012-02-24 04:33:15) >> >Emails that are encoded differently than as ASCII or UTF-8 are not >> >indexed properly by notmuch. It is not possible to search for non-ASCII >> >words within those messages. >> >> Ok. But we can preprocess each incoming message right after 'getmail' to >> convert it from html to text and to utf8 encoding. One solution is to create a >> seperate script for this and make gmail pipe all messages to this script, and >> then to notmuch. But It would be better if maildir contains original messages >> only, so the question is: can we make nomuch indexing engine to index >> preprocessed message while maildir will contain original message - as it was >> obtained? > >Hi, > >I'm not big fan of adding "preprocessor". First, I thing that both >reasons you mention are actually bugs and it would be better to fix them >for everybody than requiring each user to configure some preprocessor. >Second, depending on what and how would your preprocessor do, the >initial mail indexing could be a way slower, which is also nothing that >people want. > >Do you have any other use case for the preprocessor besides utf8 and >html->text conversions? > >Cheers, >-Michal Well, I don't want to add any external preprocessor too. This may be considered as an architectural decision: search engine should not access messages directly, but through some preprocessing layer which would handle the case of different encodings in body and headers, RFC2047-encoded headers (if this is not handled yet) etc. Anyway, this solution imho would be nice to be concluded inside a separate library which would be useful for notmuch clients as well as other mail indexing engines. Or an existing library should be looked for.