Re: mass removal of duplicates

Subject: Re: mass removal of duplicates

Date: Fri, 01 Aug 2025 10:27:27 +0200

To: Alan Schmitt, David Bremner, notmuch, Jameson Graef Rollins

Cc:

From: Anton Khirnov


Quoting Jameson Graef Rollins (2025-07-31 17:06:58)
> On Thu, Jul 31 2025, Alan Schmitt <alan.schmitt@polytechnique.org> wrote:
> > Hello David,
> >
> > On 2025-07-31 07:51, David Bremner <david@tethera.net> writes:
> >
> >> With the caveat that it is always good to have backups, something like
> >>
> >>      notmuch search --duplicate=2 --output=files '*' | xargs rm
> >
> > This is a great start, thanks. I think I’ll build up on this to have a
> > tool that shows me the duplicates side by side so that I can choose
> > which one to keep.
> 
> I'll note that notmuch determines message uniqueness based solely on the
> message ID, so it's possible for two messages to be considered
> "duplicate" if their message IDs are the same even if their content is
> completely different.

"completely different" is a pathological case that really shouldn't
happen, but it's quite common to have multiple instances of "the same"
email received through multiple paths - e.g. directly and via a mailing
list. The different copies will then have the same "main content", but
different routing headers, and MLs often add footers and such (IMO they
shouldn't, or at least they should add them in separate MIME parts, but
it is how it is).

-- 
Anton Khirnov
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Thread: