On Fri, 04 Dec 2009 06:52:38 -0500, Aaron Ecay <aaronecay@gmail.com> wrote: > The same algorithm is implemented in C here: > http://www.mnogosearch.org/guesser/ > > Licensed under the GPL and includes presets for ~50 languages. That indeed does look very interesting, (at least what I can get from google's cache of the website, as the server seems to be down just now). Oh, but I can just "apt-get source mnogosearch" and find src/mguesser.c and src/guesser.c at least. > A potential drawback is that it doesn't handle raw HTML very well, > according to the documentation. Shouldn't really be an issue. Notmuch will already want to de-tagify HTML before indexing anyway. -Carl