On Mon, 02 Jun 2014, Mark Walters <markwalters1009@gmail.com> wrote: > Tomi Ollila <tomi.ollila@iki.fi> writes: > >> On Mon, Jun 02 2014, Mark Walters <markwalters1009@gmail.com> wrote: >> >>> Vladimir Marek <Vladimir.Marek@oracle.com> writes: >>> If you want to save disk space then you could delete the duplicates >>> after with something like >>> >>> notmuch search --output=files --format=text0 --duplicate=2 '*' piped to >>> xargs -0 >> >> What if there are 3 duplicates (or 4... ;) > > I was assuming that it was merging 2 duplicate-free bunches of messages, > but I guess the new 100000 might not be. In that case running the above > repeatedly (ie until it is a no-op) would be fine. With 'notmuch new' in between the runs, obviously. Alternatively, find the biggest --duplicate=N which still outputs something, and run the command for each N...2. >> One should also have some message content heuristics to determine that the >> content is indeed duplicate and not something totally different (not that >> we can see the different content anyway... but...) > > That would be nice. And quite hard. BR, Jani.