Jeffrey Stedfast <jestedfa@microsoft.com> writes: > Hi David, > > Base64 encoded inline image data is always within the src attribute value of an <img> tag and will always begin with "data:" followed by the mime-type and then followed by ";base64," so it's pretty easy to spot. > > While on this topic, why index HTML attribute values at all? Other than perhaps some known ones like perhaps the 'alt' value of <img> tags? > > I would argue that the only portion of any HTML that you should be indexing at all for searching is the character data between tags. We're not currently parsing the HTML, so none of these distinctions are really available to us. Maybe adding an HTML parser is the right solution, but it's a bit non-trivial. d