> -----Original Message----- > From: David Bremner [mailto:david@tethera.net] > Sent: Saturday, March 18, 2017 2:15 PM > To: Jeffrey Stedfast <jestedfa@microsoft.com>; notmuch@notmuchmail.org > Subject: RE: [PATCH] test: add known broken test for indexing html > > Jeffrey Stedfast <jestedfa@microsoft.com> writes: > > > Hey David, > > > > I actually have an HTML tokenizer for MimeKit for (among other things) > > this type of purpose. Perhaps I need to port that to C and include > > that with GMime 😊 > > > > https://github.com/jstedfast/MimeKit/tree/master/MimeKit/Text > > > > Jeff > > That's probably a good idea in your abundant spare time ;). More generally > though we've thought about letting users provide filters to convert > attachements (e.g. .odt / .docx / pdf) to text. I'm not sure about the > performance hit, but I guess that would work for html as well. > I guess in principle it should be possible to write GMime filter that manages > the child process. > > d Hah, yea... it'll probably be awhile. I need to focus on GMime 3.0 first. Once I get that squared away, I can look at porting other handy features back from MimeKit 😊 Jeff