On Mon, Jun 02 2014, Vladimir Marek wrote: > Hi, > > I want to import bigger chunk of archived messages into my notmuch > database. It's about 100k messages. The problem is, that I most probably > have quite a lot of those messages in the DB. Basically I would like to > add only those I don't have already. > > There are two possibilities > > a) I will add all the 100k messages and then remove the duplicities. > > b) I will write a script which will parse the message ID's of the > to-be-added messages and try to match them to the notmuch DB. Adding > only files I can't find already. > > Ad b) might be better option, but I started to play with the idea of > deduplication. I'm thinking about listing all the message IDs stored in > DB, listing all files belonging to the IDs and deleting all but one. > Also I'm thinking about implementing some simple algorithm telling me > whether the messages are really very similar. Just to be sure I don't > delete something I don't want to. > > Was anyone playing with the idea? notsync[1] used the (lack of) existence of a message id in the store to decide whether to add something from an IMAP server, but it is old, crufty, unused and unloved code. > -- > Vlad > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch Footnotes: [1] https://github.com/dme/notsync