On Sun 2018-03-18 21:32:35 +0200, Jani Nikula wrote: > On Sun, 18 Mar 2018, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote: >> * if we know our index expects english, and we have a message part that >> *is not* english (e.g. Content-Language: es), we could avoid indexing >> that part. > > Why would we do that? Search mostly works just fine for non-English > languages, it's just that the *stemming* is not right. > >> what do you think? what ideas are missing from the branstorm above? I'd >> love to hear from people with multilingual mailboxes about how we might >> be able to make notmuch work better for them. > > With my limited understanding of this, stemming happens both at indexing > and searching. Basically at indexing, the term generator indexes both > the full and the stemmed version of words. I'm wondering if we could > look at Content-Language (and missing that, heuristics), and (if the > user so desires) use multiple term generators with different stemmers on > a per document basis. Or, use non-stemming indexing for unidentified or > unsupported languages. How far would that take us? Then, perhaps, we > could also perform language specific queries? > > I don't know how feasible that is, or if it would require Xapian > changes. thanks, this is exactly the kind of promising idea i was hoping my dumb questions and half-baked suggestions would provoke :) Maybe Olly or someone else with deeper knowledge of xapian can weigh in about the feasibility of this proposal? --dkg _______________________________________________ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch