On Sun, 18 Mar 2018, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote: > * if we know our index expects english, and we have a message part that > *is not* english (e.g. Content-Language: es), we could avoid indexing > that part. Why would we do that? Search mostly works just fine for non-English languages, it's just that the *stemming* is not right. > what do you think? what ideas are missing from the branstorm above? I'd > love to hear from people with multilingual mailboxes about how we might > be able to make notmuch work better for them. With my limited understanding of this, stemming happens both at indexing and searching. Basically at indexing, the term generator indexes both the full and the stemmed version of words. I'm wondering if we could look at Content-Language (and missing that, heuristics), and (if the user so desires) use multiple term generators with different stemmers on a per document basis. Or, use non-stemming indexing for unidentified or unsupported languages. How far would that take us? Then, perhaps, we could also perform language specific queries? I don't know how feasible that is, or if it would require Xapian changes. BR, Jani. _______________________________________________ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch