On Tue 09 Aug 2011 14:02, Tomi Ollila <tomi.ollila@nixu.com> writes: > Hi > > I get this output: > > $ notmuch new --verbose > Found 15559 total files (that's not much mail). > Processed 15559 total files in 5m 53s (43 files/sec.). > Added 15546 new messages to the database. > > $ find * -type f | wc > 15559 15559 529027 > > How can I determine which 13 files were dropped. All of those > 15559 files should be mails. I tried to check through mail files that > have no 'Subject:' header but those were (at least one) indexed. Could > it be about duplicate Message-ID: or something ? > > $ notmuch --version > notmuch 0.7-7-g68e8560 It is about duplicate Message-ID:s It would be nice that 'notmuch new' printes information about this if this were to happen (as I recall it does when new file found is not (considered as) a mail file). The steps I took to figure this out (not all iterations with & without 'wc':s shown) at the end of this email. > > Tomi Tomi --8<----8<----8<----8<----8<----8<----8<----8<----8<----8<-- $ find ~/mail/mails/* -type f | sort >! filenames-fs $ wc filenames-fs 15559 15559 855766 filenames-fs $ cd /path/to/notmuch-git/bindings/python $ cat > foo.py import notmuch db = notmuch.Database() msgs = notmuch.Query(db,'').search_messages() for f in msgs: print f.get_filename() $ PYTHONPATH=/path/to/python-json:`pwd` python foo.py | sort > filenames-db $ wc filenames-db 15546 15546 855037 filenames-db $ diff filenames-db filenames-fs | grep mails | wc 13 26 755 $ cd ~/mail $ cat >midcheck.pl use strict; use warnings; my %msgids; foreach (<mails/*/*>) { my $fn = $_; my $mid; open I, '<', $fn or die $!; while (<I>) { $mid = $1, next if /^Message-ID:\s*(.*)/i; last if /^$/; } close I; unless ($mid) { print "$fn: no Message-ID (in same line with header tag?)\n"; next; } my $fn0 = $msgids{$mid}; if (defined $fn0) { print "Files '$fn0' and '$fn' have same msg id: $mid\n"; } else { $msgids{$mid} = $fn; } } $ perl midcheck.pl | wc 13 117 2098 $ perl midcheck.pl | grep \^Files | wc 13 117 2098