Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes: > On Tue 2016-05-31 21:52:06 -0400, Daniel Kahn Gillmor wrote: > > do we actually need this abstraction? If we're aiming to build specific > new features (the two i'm thinking of are cryptographic-session-keys and > reference-adjustments), couldn't we implement those features explicitly > in xapian with their own special prefix, rather than treating them as a > generic "property"? Sure, it's certainly possible. I guess if you don't care about the possibility of iterating all pairs with given key prefix (which I admit makes more sense for the config API), then the code could be simplified to look more like the tag list handling code. C is pretty crap at generics, but I guess looking at tags.c, it's really about iterators for notmuch_string_list_t. So it could probably be generalized to serve here. For each such prefix, one would need to roughly duplicate patches 1/5 and 3/5. It took me a little while to figure 1/5 out, but now that I know, it would be less trouble. I guess my thinking here was that I would provide a low level interface that people using the C API or bindings could use without hacking xapian. > We already have a bit of an uncomfortable fit with tags and special > flags (encrypted, signed, attachment, etc), where some are expected to > be set and cleared automagically and some are expected to be manipulated > directly by the user. Are we setting ourselves up for more of the same, > or is there a principled way that a user can know which properties it's > kosher for them to set and clear, and which ones they should leave > alone? XPROPERTY is an internal prefix, which means it isn't added to the query parser. As it happens, I didn't plan on CLI access to these terms either. Both of those choices are tradeoffs to say that these are internal metadata, suitable for manipulation by programs. Such programs could be scripts using python or ruby. > If we add new specific features, we could potentially augment the dump > format explicitly for them, without having the property abstraction. We could, but I think should change the dump format quite rarely, since we risk breaking people's scripts. So if we did it for one prefix, I'd like to do in an extensible way so that adding new prefixes is somewhat transparent. It also means some duplication of effort/code in notmuch dump/restore to dump/restore each new prefix. It's probably true that per-prefix dump format would be more compact, since the keys would be implicit, rather than repeated for every pair. > We already have some explicit features for each message (subject, > from, to, attachment, mimetype, thread id, etc), and most of them are > derived from the message itself, with the hope that it could be > re-derived given just the message body. Is there a distinction > between properties that can be derived from the message body and > properties that need to be additionally derived from some other data? As Tomi always says, naming is the hardest thing; properties is a bit generic. I'm not sure the distinction you make between the "message" and the "message body" here. I think most of our derived terms are from the message header. My intent here is that "properties" are used for things that cannot be derived from the message (header or body). TL;DR: - per prefix requires new code in the library and dump/restore for every prefix + the dump format might be more compact if done in a per prefix way. + this code would be simpler than the generic properties code, mainly because it would not need key value pairs, - the library and dump/restore are parts of notmuch that have the potential to "break the world". Not too many people are comfortable hacking on them. - changing the dump format is something like an ABI change for people whose scripts rely on dump / restore.