Re: should filter out replies when indexing

Subject: Re: should filter out replies when indexing

Date: Sun, 09 Mar 2025 07:58:36 -0700

To: Martin Monperrus, notmuch@notmuchmail.org

Cc:

From: Carl Worth


Thanks for the note, Martin.

I have a hard time considering this a bug, though.

Notmuch is indexing the content of every message and searching for that
content correctly.

Imagine a case where you receive a message that is a forward of a
reply. In this case, you will not have the original message, and if
notmuch never indexed the quoted content then there would be no way for
you to search for and find the content.

So, notmuch must necessarily index the content.

What is missing is a way to be able to indicate that you want to search
for content that is in a message but not part of the quoted content.

What notmuch could do to support a feature like that is to index all
quoted content with a different term prefix than it does unquoted
content. Then, by default search could be made to match on both
terms. And new search syntax could be added to search specifically for
unquoted content.

Definitely not a change, but if someone is really motivated by wanting
the feature, that should be possible at least.

-Carl

On Sun, Mar 09 2025, Martin Monperrus wrote:
> Hi Notmuch team, Here is a bug report. Thanks, --Martin
>
> ## Actual behavior
>
> Notmuch indexes all messages including the replied content.
>
> This is a problem because when one searches for a message with content, we get all emails replying 
> to it.
>
> m1 with "foobar"
> -> m2 with "> foobar"
> -> m3 with ">> foobar"
>
> search("foobar") = = [m1, m2, m3]
>
> In the case of dozens of messages in a thread, one does not know which one to open.
>
> ## Expected behavior
>
> Notmuch indexes messages after having stripped the original.
>
> search("foobar") = = [m1]
>
> ## Notes
>
> There are different libraries for stripping the replied message. For example
>
> Ruby: https://github.com/github/email_reply_parser
> Python: https://github.com/zapier/email-reply-parser
> Python: https://github.com/mailgun/talon
> Python https://github.com/lawrencepit/email_reply_parser
> Python https://github.com/alfonsrv/mailparser-reply
> Python https://github.com/closeio/quotequail/
> JavaScript: https://github.com/turt2live/node-email-reply-parser
> Java: https://github.com/Driftt/EmailReplyParser
> PHP: https://github.com/willdurand/EmailReplyParser
> Golang https://github.com/web-ridge/email-reply-parser
>
>
>
>
>
>
>
> _______________________________________________
> notmuch mailing list -- notmuch@notmuchmail.org
> To unsubscribe send an email to notmuch-leave@notmuchmail.org
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Thread: