Re: correct way to search for only PDF attachments

Subject: Re: correct way to search for only PDF attachments

Date: Tue, 29 Sep 2015 00:51:01 -0400

To: Carl Worth

Cc: notmuch@notmuchmail.org

From: Xu Wang


On Mon, Sep 28, 2015 at 10:00 PM, Carl Worth <cworth@cworth.org> wrote:
> On Mon, Sep 28 2015, Xu Wang wrote:
>> I would look to look for all emails from a colleague jongho. I tried:
>>
>> from:jongho attachment:pdf
>>
>> which seems to do as I wanted.
>
> Good. That should work.
>
>> To understand more, what does the following search for?
>>
>> from:jongho attachment:.*pdf
>
> Uhm, probably only strange things. There are some mechanisms for getting
> notmuch to emit some debugging information on what the final search
> terms end up being, (but I don't recall if they still require
> recompilation or not).
>
> I'm not testing now, but I wouldn't be surprised if that ended up doing
> something like searching for a phrase like "attachment pdf" anywhere
> within a message. (The Xapian parser can be somewhat unpredictable when
> you give it unexpected input.)
>
>> Also, how does the first one above know that I want only PDF
>> attachments and not an attachment called "pdformula.txt" ?
>
> It doesn't know that you want only PDF attachments. The key part is that
> the indexing is performed by breaking text up into individual terms, (at
> punctuation boundaries usually). So a search specification like
> "attachment:pdf" is searching for things that were indexed with the
> "pdf" term within the attachment prefix. So that won't match a filename
> like pdformula.txt, (which would be indexed as two terms, "pdformula"
> and "txt"), but it would match pdf.ormula.txt, (which would be indexed
> as three terms, "pdf", "ormula" and "txt").
>
> The Xapian documentation can be examined if you want more details.

This is highly useful. Thank for such an explanation!! Thank you, Carl.

Kind regards,

Xu

Thread: