Jeffrey Stedfast <jestedfa@microsoft.com> writes: > Hey David, > > I actually have an HTML tokenizer for MimeKit for (among other things) this type of purpose. Perhaps I need to port that to C and include that with GMime 😊 > > https://github.com/jstedfast/MimeKit/tree/master/MimeKit/Text > > Jeff That's probably a good idea in your abundant spare time ;). More generally though we've thought about letting users provide filters to convert attachements (e.g. .odt / .docx / pdf) to text. I'm not sure about the performance hit, but I guess that would work for html as well. I guess in principle it should be possible to write GMime filter that manages the child process. d