David Bremner <david@tethera.net> writes: > 'quite' on IRC reported that notmuch new was grinding to a halt during > initial indexing, and we eventually narrowed the problem down to some > html parts with large embedded images. These cause the number of terms > added to the Xapian database to explode (the first 400 messages > generated 4.6M unique terms), and of course the resulting terms are > not much use for searching. > > The second test is sanity check for any "improved" indexing of HTML. pushed the first patch in the series to master. d