Jeffrey Stedfast <jestedfa@microsoft.com> writes: > Base64 encoded inline image data is always within the src attribute > value of an <img> tag and will always begin with "data:" followed by > the mime-type and then followed by ";base64," so it's pretty easy to > spot. > > While on this topic, why index HTML attribute values at all? Other >than perhaps some known ones like perhaps the 'alt' value of <img> >tags? > > I would argue that the only portion of any HTML that you should be > indexing at all for searching is the character data between tags. > I should mention that we also have a fair amount of base64 gunk from inline PGP signatures. I'm not sure if it's just ugly to look at when dumping the database term, or if it actually makes a measurable difference in time/space usage. d