Re: [notmuch] Notmuch's search view sucks

Subject: Re: [notmuch] Notmuch's search view sucks

Date: Fri, 4 Dec 2009 10:36:45 +0000 (UTC)

Cc:

Karl Wiberg writes:
> On Fri, Dec 4, 2009 at 1:29 AM, Carl Worth wrote:
> > And a step beyond that would support different languages for
> > different emails, but that sounds like something "hard" to identify.
> 
> But probably not as hard as identifying spam. It could probably be
> done with a simple Bayesian filter counting word frequencies---but
> it'd be much better if somebody else had already solved the problem,
> since this smells suspiciously like something that ought to be a
> separate project and put in a library ... does anyone know if such a
> project already exists?

There's TextCat:

http://www.let.rug.nl/vannoord/TextCat/

It looks at n-gram frequencies, and can guess pretty reliably from
even a fairly small amount of text.

TextCat is in Perl.  I don't know if there's a C or C++ implementation
but it isn't a huge piece of code - finding a good technique was the
clever part of it.

Cheers,
    Olly

Previous message (by thread): Re: [notmuch] Notmuch's search view sucks

Thread:

Gregor Hoffleit—[notmuch] Notmuch's search view sucks [inbox, unread]
- Carl Worth—Re: [notmuch] Notmuch's search view sucks [inbox, signed, unread]
  - Karl Wiberg—Re: [notmuch] Notmuch's search view sucks [inbox, unread]
    - Olly Betts—Re: [notmuch] Notmuch's search view sucks [inbox, unread]
      - Aaron Ecay—Re: [notmuch] Notmuch's search view sucks [inbox, unread]
        Carl Worth—Re: [notmuch] Notmuch's search view sucks [inbox, signed, unread]
    - Baruch Even—Re: [notmuch] Notmuch's search view sucks [inbox, unread]