Re: [notmuch] Notmuch's search view sucks

Subject: Re: [notmuch] Notmuch's search view sucks

Date: Fri, 4 Dec 2009 10:36:45 +0000 (UTC)

To: notmuch@notmuchmail.org

Cc:

From: Olly Betts


Karl Wiberg writes:
> On Fri, Dec 4, 2009 at 1:29 AM, Carl Worth wrote:
> > And a step beyond that would support different languages for
> > different emails, but that sounds like something "hard" to identify.
> 
> But probably not as hard as identifying spam. It could probably be
> done with a simple Bayesian filter counting word frequencies---but
> it'd be much better if somebody else had already solved the problem,
> since this smells suspiciously like something that ought to be a
> separate project and put in a library ... does anyone know if such a
> project already exists?

There's TextCat:

http://www.let.rug.nl/vannoord/TextCat/

It looks at n-gram frequencies, and can guess pretty reliably from
even a fairly small amount of text.

TextCat is in Perl.  I don't know if there's a C or C++ implementation
but it isn't a huge piece of code - finding a good technique was the
clever part of it.

Cheers,
    Olly


Thread: