Re: [PATCH v3 3/5] Add indexing for the mimetype term

Subject: Re: [PATCH v3 3/5] Add indexing for the mimetype term

Date: Sat, 17 Jan 2015 16:21:50 +0100

To: Todd, notmuch@notmuchmail.org

Cc:

From: David Bremner


Todd <todd@electricoding.com> writes:

> Adds the indexing and removes the broken test flag
> ---
>  lib/database.cc        |  1 +
>  lib/index.cc           | 10 ++++++++++
>  test/T190-multipart.sh |  4 ----
>  3 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/lib/database.cc b/lib/database.cc
> index 0d2c417..3974e2e 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -254,6 +254,7 @@ static prefix_t PROBABILISTIC_PREFIX[]= {
>      { "from",			"XFROM" },
>      { "to",			"XTO" },
>      { "attachment",		"XATTACHMENT" },
> +    { "mimetype",		"XMIMETYPE"},
>      { "subject",		"XSUBJECT"},
>  };

I think the commit message should articulate why we are indexing this as
a probabilistic prefix, rather than as a boolean prefix. In particular,
this gives people a last chance to complain.

The reference I know is http://xapian.org/docs/queryparser.html

If I understand correctly (it would be great if you could test this
Todd) , with a probabilistic prefix,

   mimetime:pdf

will match

application/pdf
image/pdf
application/x-pdf
application/x-ext-pdf

but not

application/x-bzpdf
application/x-gzpdf
application/x-xzpdf

On the whole, this is probably more beneficial than bad.  The downside
of probabilistic prefixes/fields is that they are not "anchored", so
there is no easy way to distinguish

      application/pdf

from

      pdf
      application/x-pdf

I guess in a perfect world this would also be explained in
notmuch-search-terms(7), but that's pretty much orthogonal to this
series.

d

Thread: