Re: [RFC patch 2/2] lib: index message files with duplicate message-ids

Subject: Re: [RFC patch 2/2] lib: index message files with duplicate message-ids

Date: Thu, 16 Mar 2017 21:34:22 -0300

To: Daniel Kahn Gillmor, notmuch@notmuchmail.org

Cc:

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

> On Wed 2017-03-15 21:57:28 -0400, David Bremner wrote:
>> The corresponding xapian document just gets more terms added to it,
>> but this doesn't seem to break anything.
>
> this is an interesting suggestion.  thanks for proposing it!
>
> A couple questions:
>
>  0) what happens when one of the files gets deleted from the message
>     store? do the terms it contributes get removed from the index?
>

That's a good guestion, and an issue I hadn't thought about.
Currently there's no way to do this short of deleting all the terms (for
all the files (excepting tags and properties, presumably) and
reindexing. This will require some more thought, I think.

>  1) when a message is displayed to the user as a result of a match, it
>     gets pulled from one of the files, not both.  if it's pulled from
>     the file that didn't have the term the user searched for, that's
>     likely to be confusing.  do you have a way to avoid that confusion?

I was looking for an incremental improvement, so I imagined something
like various output flagging "yes, there are duplicate files for this
message", and letting users dig those out using something like the
--duplicate= option.

> It also occurs to me that one of the things i'd love to have is
> well-indexed notes about any given e-mail.  So if this was adopted, i
> could presumably just write a file that has the same Message-Id as the
> message, put my notes in it, and index it.  that's a little weird,
> though.  would there be a better way to do such a thing?
>
>          --dkg

One option would be to use a note=foo mesage property. That's not
immediately searchable though, although we could kludge together
something like the subject regexp search which would be slower.

d

Previous message (by thread): Re: [RFC patch 2/2] lib: index message files with duplicate message-ids

Thread:

David Bremner—a first step for the duplicate message-id dilemma [inbox, unread]
- David Bremner—[RFC patch 1/2] test: add known broken test for duplicate message id [inbox, notmuch::obsolete, notmuch::patch, unread]
- David Bremner—[RFC patch 2/2] lib: index message files with duplicate message-ids [inbox, notmuch::obsolete, notmuch::patch, unread]
  - Daniel Kahn Gillmor—Re: [RFC patch 2/2] lib: index message files with duplicate message-ids [inbox, unread]
    - David Bremner—Re: [RFC patch 2/2] lib: index message files with duplicate message-ids [inbox, unread]
      - Daniel Kahn Gillmor—Re: [RFC patch 2/2] lib: index message files with duplicate message-ids [inbox, signed, unread]
        David Bremner—Re: [RFC patch 2/2] lib: index message files with duplicate message-ids [inbox, unread]
      - Jani Nikula—Re: [RFC patch 2/2] lib: index message files with duplicate message-ids [inbox, unread]
    - Mark Walters—Re: [RFC patch 2/2] lib: index message files with duplicate message-ids [inbox, unread]