Re: [PATCH 05/14] lib: Database version 3: Introduce fine-grained "features"

Subject: Re: [PATCH 05/14] lib: Database version 3: Introduce fine-grained "features"

Date: Sun, 27 Jul 2014 10:53:25 +0100

To: Austin Clements, notmuch@notmuchmail.org

Cc:

From: Mark Walters


On Sun, 27 Jul 2014, Austin Clements <amdragon@MIT.EDU> wrote:
> Previously, our database schema was versioned by a single number.
> Each database schema change had to occur "atomically" in Notmuch's
> development history: before some commit, Notmuch used version N, after
> that commit, it used version N+1.  Hence, each new schema version
> could introduce only one change, the task of developing a schema
> change fell on a single person, and it all had to happen and be
> perfect in a single commit series.  This made introducing a new schema
> version hard.  We've seen only two schema changes in the history of
> Notmuch.

I like this series as a whole and I am happy with all the patches that I
haven't sent comments on. (Note, however, that I am not very familiar
with the database code.)

There is one thing which confuses me in this patch:

>
> This commit introduces database schema version 3; hopefully the last
> schema version we'll need for a while.  With this version, we switch
> from a single version number to "features": a set of named,
> independent aspects of the database schema.
>
> Features should make backwards compatibility easier.  For many things,
> it should be easy to support databases both with and without a
> feature, which will allow us to make upgrades optional and will enable
> "unstable" features that can be developed and tested over time.
>
> Features also make forwards compatibility easier.  The features
> recorded in a database include "compatibility flags," which can
> indicate to an older version of Notmuch when it must support a given
> feature to open the database for read or for write.  This lets us
> replace the old vague "I don't recognize this version, so something
> might go wrong, but I promise to try my best" warnings upon opening a
> database with an unknown version with precise errors.  If a database
> is safe to open for read/write despite unknown features, an older
> version will know that and issue no message at all.  If the database
> is not safe to open for read/write because of unknown features, an
> older version will know that, too, and can tell the user exactly which
> required features it lacks support for.
> ---
>  lib/database-private.h |  56 +++++++++++++++
>  lib/database.cc        | 189 +++++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 214 insertions(+), 31 deletions(-)
>
> diff --git a/lib/database-private.h b/lib/database-private.h
> index d3e65fd..323b9fe 100644
> --- a/lib/database-private.h
> +++ b/lib/database-private.h
> @@ -46,6 +46,11 @@ struct _notmuch_database {
>      int atomic_nesting;
>      Xapian::Database *xapian_db;
>  
> +    /* Bit mask of features used by this database.  Features are
> +     * named, independent aspects of the database schema.  This is a
> +     * bitwise-OR of NOTMUCH_FEATURE_* values (below). */
> +    unsigned int features;
> +
>      unsigned int last_doc_id;
>      uint64_t last_thread_id;
>  
> @@ -55,6 +60,57 @@ struct _notmuch_database {
>      Xapian::ValueRangeProcessor *date_range_processor;
>  };
>  
> +/* Bit masks for _notmuch_database::features. */
> +enum {
> +    /* If set, file names are stored in "file-direntry" terms.  If
> +     * unset, file names are stored in document data.
> +     *
> +     * Introduced: version 1.  Implementation support: both for read;
> +     * required for write. */
> +    NOTMUCH_FEATURE_FILE_TERMS = 1 << 0,
> +
> +    /* If set, directory timestamps are stored in documents with
> +     * XDIRECTORY terms and relative paths.  If unset, directory
> +     * timestamps are stored in documents with XTIMESTAMP terms and
> +     * absolute paths.
> +     *
> +     * Introduced: version 1.  Implementation support: required. */
> +    NOTMUCH_FEATURE_DIRECTORY_DOCS = 1 << 1,
> +
> +    /* If set, the from, subject, and message-id headers are stored in
> +     * message document values.  If unset, message documents *may*
> +     * have these values, but if the value is empty, it must be
> +     * retrieved from the message file.
> +     *
> +     * Introduced: optional in version 1, required as of version 3.
> +     * Implementation support: both.
> +     */
> +    NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES = 1 << 2,
> +
> +    /* If set, folder terms are boolean and path terms exist.  If
> +     * unset, folder terms are probabilistic and stemmed and path
> +     * terms do not exist.
> +     *
> +     * Introduced: version 2.  Implementation support: required. */
> +    NOTMUCH_FEATURE_BOOL_FOLDER = 1 << 3,
> +};
> +
> +/* Prior to database version 3, features were implied by the database
> + * version number, so hard-code them for earlier versions. */
> +#define NOTMUCH_FEATURES_V0 (0)
> +#define NOTMUCH_FEATURES_V1 (NOTMUCH_FEATURES_V0 | NOTMUCH_FEATURE_FILE_TERMS | \
> +			     NOTMUCH_FEATURE_DIRECTORY_DOCS)
> +#define NOTMUCH_FEATURES_V2 (NOTMUCH_FEATURES_V1 | NOTMUCH_FEATURE_BOOL_FOLDER)
> +
> +/* Current database features.  If any of these are missing from a
> + * database, request an upgrade.
> + * NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES is not included because
> + * upgrade doesn't currently introduce the feature (though brand new
> + * databases will have it). */
> +#define NOTMUCH_FEATURES_CURRENT \
> +    (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \
> +     NOTMUCH_FEATURE_BOOL_FOLDER)
> +
>  /* Return the list of terms from the given iterator matching a prefix.
>   * The prefix will be stripped from the strings in the returned list.
>   * The list will be allocated using ctx as the talloc context.
> diff --git a/lib/database.cc b/lib/database.cc
> index 45c4260..03eef3e 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -20,6 +20,7 @@
>  
>  #include "database-private.h"
>  #include "parse-time-vrp.h"
> +#include "string-util.h"
>  
>  #include <iostream>
>  
> @@ -42,7 +43,7 @@ typedef struct {
>      const char *prefix;
>  } prefix_t;
>  
> -#define NOTMUCH_DATABASE_VERSION 2
> +#define NOTMUCH_DATABASE_VERSION 3
>  
>  #define STRINGIFY(s) _SUB_STRINGIFY(s)
>  #define _SUB_STRINGIFY(s) #s
> @@ -151,6 +152,17 @@ typedef struct {
>   *			changes are made to the database (such as by
>   *			indexing new fields).
>   *
> + *	features	The set of features supported by this
> + *			database. This consists of a set of
> + *			'\n'-separated lines, where each is a feature
> + *			name, a '\t', and compatibility flags.  If the
> + *			compatibility flags contain 'w', then the
> + *			opener must support this feature to safely
> + *			write this database.  If the compatibility
> + *			flags contain 'r', then the opener must
> + *			support this feature to read this database.
> + *			Introduced in database version 3.
> + *
>   *	last_thread_id	The last thread ID generated. This is stored
>   *			as a 16-byte hexadecimal ASCII representation
>   *			of a 64-bit unsigned integer. The first ID
> @@ -226,6 +238,7 @@ static prefix_t PROBABILISTIC_PREFIX[]= {
>      { "subject",		"XSUBJECT"},
>  };
>  
> +
>  const char *
>  _find_prefix (const char *name)
>  {
> @@ -251,6 +264,25 @@ _find_prefix (const char *name)
>      return "";
>  }
>  
> +static const struct
> +{
> +    /* NOTMUCH_FEATURE_* value. */
> +    unsigned int value;
> +    /* Feature name as it appears in the database.  This name should
> +     * be appropriate for displaying to the user if an older version
> +     * of notmuch doesn't support this feature. */
> +    const char *name;
> +    /* Compatibility flags when this feature is declared. */
> +    const char *flags;
> +} feature_names[] = {
> +    {NOTMUCH_FEATURE_FILE_TERMS,             "file terms", "rw"},
> +    {NOTMUCH_FEATURE_DIRECTORY_DOCS,         "directory documents", "rw"},
> +    /* Header values are not required for reading a database because a
> +     * reader can just refer to the message file. */
> +    {NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES, "from/subject/ID values", "w"},
> +    {NOTMUCH_FEATURE_BOOL_FOLDER,            "boolean folder terms", "rw"},
> +};
> +
>  const char *
>  notmuch_status_to_string (notmuch_status_t status)
>  {
> @@ -591,6 +623,11 @@ notmuch_database_create (const char *path, notmuch_database_t **database)
>  				    &notmuch);
>      if (status)
>  	goto DONE;
> +
> +    /* Upgrade doesn't add this feature to existing databases, but new
> +     * databases have it. */
> +    notmuch->features |= NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES;
> +

Does this mean that if the user wants this feature he has to dump,
delete the existing database, and restore? Is that just because no-one
has implemented upgrade for this feature or is it "hard" in some sense?

A possibly related question: is there likely to be a case where the user
does not want to add/upgrade some feature? Would it be worth having an
explicit upgrade command which let the user choose which features to
upgrade? This is orthogonal to this series: I am just trying to get a
feel for how it will, or could, be used.

Best wishes

Mark

>      status = notmuch_database_upgrade (notmuch, NULL, NULL);
>      if (status) {
>  	notmuch_database_close(notmuch);
> @@ -619,6 +656,80 @@ _notmuch_database_ensure_writable (notmuch_database_t *notmuch)
>      return NOTMUCH_STATUS_SUCCESS;
>  }
>  
> +/* Parse a database features string from the given database version.
> + *
> + * For version < 3, this ignores the features string and returns a
> + * hard-coded set of features.
> + *
> + * If there are unrecognized features that are required to open the
> + * database in mode (which should be 'r' or 'w'), return a
> + * comma-separated list of unrecognized but required features in
> + * *incompat_out, which will be allocated from ctx.
> + */
> +static unsigned int
> +_parse_features (const void *ctx, const char *features, unsigned int version,
> +		 char mode, char **incompat_out)
> +{
> +    unsigned int res = 0, namelen, i;
> +    size_t llen = 0;
> +    const char *flags;
> +
> +    /* Prior to database version 3, features were implied by the
> +     * version number. */
> +    if (version == 0)
> +	return NOTMUCH_FEATURES_V0;
> +    else if (version == 1)
> +	return NOTMUCH_FEATURES_V1;
> +    else if (version == 2)
> +	return NOTMUCH_FEATURES_V2;
> +
> +    /* Parse the features string */
> +    while ((features = strtok_len_c (features + llen, "\n", &llen)) != NULL) {
> +	flags = strchr (features, '\t');
> +	if (! flags || flags > features + llen)
> +	    continue;
> +	namelen = flags - features;
> +
> +	for (i = 0; i < ARRAY_SIZE (feature_names); ++i) {
> +	    if (strlen (feature_names[i].name) == namelen &&
> +		strncmp (feature_names[i].name, features, namelen) == 0) {
> +		res |= feature_names[i].value;
> +		break;
> +	    }
> +	}
> +
> +	if (i == ARRAY_SIZE (feature_names)) {
> +	    /* Unrecognized feature */
> +	    const char *have = strchr (flags, mode);
> +	    if (have && have < features + llen) {
> +		/* This feature is required to access this database in
> +		 * 'mode', but we don't understand it. */
> +		if (! *incompat_out)
> +		    *incompat_out = talloc_strdup (ctx, "");
> +		*incompat_out = talloc_asprintf_append_buffer (
> +		    *incompat_out, "%s%.*s", **incompat_out ? ", " : "",
> +		    namelen, features);
> +	    }
> +	}
> +    }
> +
> +    return res;
> +}
> +
> +static char *
> +_print_features (const void *ctx, unsigned int features)
> +{
> +    unsigned int i;
> +    char *res = talloc_strdup (ctx, "");
> +
> +    for (i = 0; i < ARRAY_SIZE (feature_names); ++i)
> +	if (features & feature_names[i].value)
> +	    res = talloc_asprintf_append_buffer (
> +		res, "%s\t%s\n", feature_names[i].name, feature_names[i].flags);
> +
> +    return res;
> +}
> +
>  notmuch_status_t
>  notmuch_database_open (const char *path,
>  		       notmuch_database_mode_t mode,
> @@ -627,7 +738,7 @@ notmuch_database_open (const char *path,
>      notmuch_status_t status = NOTMUCH_STATUS_SUCCESS;
>      void *local = talloc_new (NULL);
>      notmuch_database_t *notmuch = NULL;
> -    char *notmuch_path, *xapian_path;
> +    char *notmuch_path, *xapian_path, *incompat_features;
>      struct stat st;
>      int err;
>      unsigned int i, version;
> @@ -686,39 +797,51 @@ notmuch_database_open (const char *path,
>  	if (mode == NOTMUCH_DATABASE_MODE_READ_WRITE) {
>  	    notmuch->xapian_db = new Xapian::WritableDatabase (xapian_path,
>  							       Xapian::DB_CREATE_OR_OPEN);
> -	    version = notmuch_database_get_version (notmuch);
> -
> -	    if (version > NOTMUCH_DATABASE_VERSION) {
> -		fprintf (stderr,
> -			 "Error: Notmuch database at %s\n"
> -			 "       has a newer database format version (%u) than supported by this\n"
> -			 "       version of notmuch (%u). Refusing to open this database in\n"
> -			 "       read-write mode.\n",
> -			 notmuch_path, version, NOTMUCH_DATABASE_VERSION);
> -		notmuch->mode = NOTMUCH_DATABASE_MODE_READ_ONLY;
> -		notmuch_database_destroy (notmuch);
> -		notmuch = NULL;
> -		status = NOTMUCH_STATUS_FILE_ERROR;
> -		goto DONE;
> -	    }
> -
> -	    if (version < NOTMUCH_DATABASE_VERSION)
> -		notmuch->needs_upgrade = TRUE;
>  	} else {
>  	    notmuch->xapian_db = new Xapian::Database (xapian_path);
> -	    version = notmuch_database_get_version (notmuch);
> -	    if (version > NOTMUCH_DATABASE_VERSION)
> -	    {
> -		fprintf (stderr,
> -			 "Warning: Notmuch database at %s\n"
> -			 "         has a newer database format version (%u) than supported by this\n"
> -			 "         version of notmuch (%u). Some operations may behave incorrectly,\n"
> -			 "         (but the database will not be harmed since it is being opened\n"
> -			 "         in read-only mode).\n",
> -			 notmuch_path, version, NOTMUCH_DATABASE_VERSION);
> -	    }
>  	}
>  
> +	/* Check version.  As of database version 3, we represent
> +	 * changes in terms of features, so assume a version bump
> +	 * means a dramatically incompatible change. */
> +	version = notmuch_database_get_version (notmuch);
> +	if (version > NOTMUCH_DATABASE_VERSION) {
> +	    fprintf (stderr,
> +		     "Error: Notmuch database at %s\n"
> +		     "       has a newer database format version (%u) than supported by this\n"
> +		     "       version of notmuch (%u).\n",
> +		     notmuch_path, version, NOTMUCH_DATABASE_VERSION);
> +	    notmuch->mode = NOTMUCH_DATABASE_MODE_READ_ONLY;
> +	    notmuch_database_destroy (notmuch);
> +	    notmuch = NULL;
> +	    status = NOTMUCH_STATUS_FILE_ERROR;
> +	    goto DONE;
> +	}
> +
> +	/* Check features. */
> +	incompat_features = NULL;
> +	notmuch->features = _parse_features (
> +	    local, notmuch->xapian_db->get_metadata ("features").c_str (),
> +	    version, mode == NOTMUCH_DATABASE_MODE_READ_WRITE ? 'w' : 'r',
> +	    &incompat_features);
> +	if (incompat_features) {
> +	    fprintf (stderr,
> +		     "Error: Notmuch database at %s\n"
> +		     "       requires features (%s)\n"
> +		     "       not supported by this version of notmuch.\n",
> +		     notmuch_path, incompat_features);
> +	    notmuch->mode = NOTMUCH_DATABASE_MODE_READ_ONLY;
> +	    notmuch_database_destroy (notmuch);
> +	    notmuch = NULL;
> +	    status = NOTMUCH_STATUS_FILE_ERROR;
> +	    goto DONE;
> +	}
> +
> +	/* Do we want an upgrade? */
> +	if (mode == NOTMUCH_DATABASE_MODE_READ_WRITE &&
> +	    NOTMUCH_FEATURES_CURRENT & ~notmuch->features)
> +	    notmuch->needs_upgrade = TRUE;
> +
>  	notmuch->last_doc_id = notmuch->xapian_db->get_lastdocid ();
>  	last_thread_id = notmuch->xapian_db->get_metadata ("last_thread_id");
>  	if (last_thread_id.empty ()) {
> @@ -1077,6 +1200,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
>  						   double progress),
>  			  void *closure)
>  {
> +    void *local = talloc_new (NULL);
>      Xapian::WritableDatabase *db;
>      struct sigaction action;
>      struct itimerval timerval;
> @@ -1226,6 +1350,8 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
>  	notmuch_query_destroy (query);
>      }
>  
> +    notmuch->features |= NOTMUCH_FEATURES_CURRENT;
> +    db->set_metadata ("features", _print_features (local, notmuch->features));
>      db->set_metadata ("version", STRINGIFY (NOTMUCH_DATABASE_VERSION));
>      db->flush ();
>  
> @@ -1302,6 +1428,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
>  	sigaction (SIGALRM, &action, NULL);
>      }
>  
> +    talloc_free (local);
>      return NOTMUCH_STATUS_SUCCESS;
>  }
>  
> -- 
> 2.0.0
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

Thread: