RE: [PATCH] test: add known broken test for indexing html

Subject: RE: [PATCH] test: add known broken test for indexing html

Date: Sat, 18 Mar 2017 15:14:53 -0300

To: Jeffrey Stedfast, notmuch@notmuchmail.org

Cc:

From: David Bremner


Jeffrey Stedfast <jestedfa@microsoft.com> writes:

> Hey David,
>
> I actually have an HTML tokenizer for MimeKit for (among other things) this type of purpose. Perhaps I need to port that to C and include that with GMime 😊
>
> https://github.com/jstedfast/MimeKit/tree/master/MimeKit/Text
>
> Jeff

That's probably a good idea in your abundant spare time ;).  More
generally though we've thought about letting users provide filters to
convert attachements (e.g. .odt / .docx / pdf) to text. I'm not sure
about the performance hit, but I guess that would work for html as well.
I guess in principle it should be possible to write GMime filter that
manages the child process.

d

Thread: