Mark Walters <markwalters1009@gmail.com> writes: > Vladimir Marek <Vladimir.Marek@oracle.com> writes: > >>> > I want to import bigger chunk of archived messages into my notmuch >>> > database. It's about 100k messages. The problem is, that I most probably >>> > have quite a lot of those messages in the DB. Basically I would like to >>> > add only those I don't have already. >>> > >>> > There are two possibilities >>> > >>> > a) I will add all the 100k messages and then remove the duplicities. >>> > >>> > b) I will write a script which will parse the message ID's of the >>> > to-be-added messages and try to match them to the notmuch DB. Adding >>> > only files I can't find already. >>> > >>> > Ad b) might be better option, but I started to play with the idea of >>> > deduplication. I'm thinking about listing all the message IDs stored in >>> > DB, listing all files belonging to the IDs and deleting all but one. >>> > Also I'm thinking about implementing some simple algorithm telling me >>> > whether the messages are really very similar. Just to be sure I don't >>> > delete something I don't want to. >>> > >>> > Was anyone playing with the idea? >>> >>> notsync[1] used the (lack of) existence of a message id in the store to >>> decide whether to add something from an IMAP server, but it is old, >>> crufty, unused and unloved code. >> >> I see, that's close to my b) solution, thanks! > > Did you mean a) here? The idea was to add them all first and then run > this script to delete the duplicates. > Sorry: out of order arrival times and lack of care on my part. Sorry! MW > Best wishes > > Mark > >> -- >> Vlad >> _______________________________________________ >> notmuch mailing list >> notmuch@notmuchmail.org >> http://notmuchmail.org/mailman/listinfo/notmuch