On Wed, 3 Aug 2011 16:47:32 -0400, Austin Clements <amdragon@mit.edu> wrote: > The patch I posted above includes message ID's in search results as a > proxy for the match set (which can then be used in a tagging operation > to tag exactly the results you saw). However, from an efficiency > standpoint, it makes more sense to capture the match set directly as > document ID's. > > I've had an implementation of this for a while, but finally got around > to benchmarking the difference between tagging using message ID's > versus using document ID's. It looks like tagging spends about 2/3rds > of its time performing queries, and only about 1/3rd actually tagging, > so tagging using document ID's is 3x-4x faster. Wow, this sounds very cool, Austin. > The downside to using document ID's is that we need API's to expose > them. My prototype exposes these as opaque "object ID"s, which acts a > lot like message IDs, but has no intrinsic meaning outside of the > library. This needs two library functions: one to retrieve a > message's object ID and another to retrieve a message by object ID. This sounds totally reasonable to me. Maybe we could use something like "oid:" from the command line? > 3x-4x isn't enough to make me jump on this added complexity, but it's > enough to make me consider it. Carl, I'd love to hear your thoughts. Imho 3x-4x is actually a pretty huge improvement. Is it really that much of an added complexity to add those two functions? That actually seems like a relatively simple patch to me. jamie.