Re: [RFC patch 2/2] lib: index message files with duplicate message-ids

Subject: Re: [RFC patch 2/2] lib: index message files with duplicate message-ids

Date: Wed, 22 Mar 2017 19:29:30 +0200

To: David Bremner, Daniel Kahn Gillmor, notmuch@notmuchmail.org

Cc:

From: Jani Nikula


On Thu, 16 Mar 2017, David Bremner <david@tethera.net> wrote:
> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>
>> On Wed 2017-03-15 21:57:28 -0400, David Bremner wrote:
>>> The corresponding xapian document just gets more terms added to it,
>>> but this doesn't seem to break anything.
>>
>> this is an interesting suggestion.  thanks for proposing it!
>>
>> A couple questions:
>>
>>  0) what happens when one of the files gets deleted from the message
>>     store? do the terms it contributes get removed from the index?
>>
>
> That's a good guestion, and an issue I hadn't thought about.
> Currently there's no way to do this short of deleting all the terms (for
> all the files (excepting tags and properties, presumably) and
> reindexing. This will require some more thought, I think.

We already see some of this issue. First file gets indexed, second file
gets added, first file gets removed.

There's also the related problem of reindexing potentially changing the
file being indexed and returned. The first time around the indexing
order is likely the order the message files were received in; on
reindexing it's the order the message files are encountered in the file
system. I presume the patch at hand keeps the search terms that find the
messages the same regardless of the indexing order.

BR,
Jani.

Thread: