Re: [PATCH 1/3] database: Add notmuch_database_compact_close

Subject: Re: [PATCH 1/3] database: Add notmuch_database_compact_close

Date: Sun, 01 Sep 2013 18:43:03 +0300

To: Ben Gamari, notmuch@notmuchmail.org

Cc:

From: Jani Nikula


On Sat, 24 Aug 2013, Ben Gamari <bgamari.foss@gmail.com> wrote:
> This function uses Xapian's Compactor machinery to compact the notmuch
> database. The compacted database is built in a temporary directory and
> later moved into place while the original uncompacted database is
> preserved.
>
> Signed-off-by: Ben Gamari <bgamari.foss@gmail.com>
> ---
>  configure       | 27 +++++++++++++++--
>  lib/database.cc | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/notmuch.h   | 15 ++++++++++
>  3 files changed, 130 insertions(+), 3 deletions(-)
>
> diff --git a/configure b/configure
> index 6166917..ee9e887 100755
> --- a/configure
> +++ b/configure
> @@ -277,7 +277,8 @@ printf "Checking for Xapian development files... "
>  have_xapian=0
>  for xapian_config in ${XAPIAN_CONFIG}; do
>      if ${xapian_config} --version > /dev/null 2>&1; then
> -	printf "Yes (%s).\n" $(${xapian_config} --version | sed -e 's/.* //')
> +	xapian_version=$(${xapian_config} --version | sed -e 's/.* //')
> +	printf "Yes (%s).\n" ${xapian_version}
>  	have_xapian=1
>  	xapian_cxxflags=$(${xapian_config} --cxxflags)
>  	xapian_ldflags=$(${xapian_config} --libs)
> @@ -289,6 +290,21 @@ if [ ${have_xapian} = "0" ]; then
>      errors=$((errors + 1))
>  fi
>  
> +# Compaction is only supported on Xapian > 1.2.6
> +have_xapian_compact=0
> +if [ ${have_xapian} = "1" ]; then
> +    printf "Checking for Xapian compact support... "
> +    case "${xapian_version}" in
> +        0.*|1.[01].*|1.2.[0-5])
> +            printf "No (only available with Xapian > 1.2.6).\n" ;;
> +        [1-9]*.[0-9]*.[0-9]*)
> +            have_xapian_compact=1
> +            printf "Yes.\n" ;;
> +        *)
> +            printf "Unknown version.\n" ;;
> +    esac
> +fi
> +
>  printf "Checking for GMime development files... "
>  have_gmime=0
>  IFS=';'
> @@ -729,6 +745,9 @@ HAVE_STRCASESTR = ${have_strcasestr}
>  # build its own version)
>  HAVE_STRSEP = ${have_strsep}
>  
> +# Whether the Xapian version in use supports compaction
> +HAVE_XAPIAN_COMPACT = ${have_xapian_compact}
> +
>  # Whether the getpwuid_r function is standards-compliant
>  # (if not, then notmuch will #define _POSIX_PTHREAD_SEMANTICS
>  # to enable the standards-compliant version -- needed for Solaris)
> @@ -787,13 +806,15 @@ CONFIGURE_CFLAGS = -DHAVE_GETLINE=\$(HAVE_GETLINE) \$(GMIME_CFLAGS)      \\
>  		   -DHAVE_STRCASESTR=\$(HAVE_STRCASESTR)                 \\
>  		   -DHAVE_STRSEP=\$(HAVE_STRSEP)                         \\
>  		   -DSTD_GETPWUID=\$(STD_GETPWUID)                       \\
> -		   -DSTD_ASCTIME=\$(STD_ASCTIME)
> +		   -DSTD_ASCTIME=\$(STD_ASCTIME)                         \\
> +                   -DHAVE_XAPIAN_COMPACT=\$(HAVE_XAPIAN_COMPACT)
>  CONFIGURE_CXXFLAGS = -DHAVE_GETLINE=\$(HAVE_GETLINE) \$(GMIME_CFLAGS)    \\
>  		     \$(TALLOC_CFLAGS) -DHAVE_VALGRIND=\$(HAVE_VALGRIND) \\
>  		     \$(VALGRIND_CFLAGS) \$(XAPIAN_CXXFLAGS)             \\
>  		     -DHAVE_STRCASESTR=\$(HAVE_STRCASESTR)               \\
>  		     -DHAVE_STRSEP=\$(HAVE_STRSEP)                       \\
>  		     -DSTD_GETPWUID=\$(STD_GETPWUID)                     \\
> -		     -DSTD_ASCTIME=\$(STD_ASCTIME)
> +		     -DSTD_ASCTIME=\$(STD_ASCTIME)                       \\
> +                     -DHAVE_XAPIAN_COMPACT=\$(HAVE_XAPIAN_COMPACT)
>  CONFIGURE_LDFLAGS =  \$(GMIME_LDFLAGS) \$(TALLOC_LDFLAGS) \$(XAPIAN_LDFLAGS)
>  EOF
> diff --git a/lib/database.cc b/lib/database.cc
> index 5cc0765..b71751d 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -268,6 +268,8 @@ notmuch_status_to_string (notmuch_status_t status)
>  	return "Unbalanced number of calls to notmuch_message_freeze/thaw";
>      case NOTMUCH_STATUS_UNBALANCED_ATOMIC:
>  	return "Unbalanced number of calls to notmuch_database_begin_atomic/end_atomic";
> +    case NOTMUCH_STATUS_UNSUPPORTED_OPERATION:
> +	return "Unsupported operation";
>      default:
>      case NOTMUCH_STATUS_LAST_STATUS:
>  	return "Unknown error status value";
> @@ -800,6 +802,95 @@ notmuch_database_close (notmuch_database_t *notmuch)
>      notmuch->date_range_processor = NULL;
>  }
>  
> +class NotmuchCompactor : public Xapian::Compactor
> +{
> +public:
> +    virtual void
> +    set_status (const std::string &table, const std::string &status)
> +    {
> +	if (status.length() == 0)
> +	    printf ("compacting table %s:\n", table.c_str());
> +	else
> +	    printf ("     %s\n", status.c_str());
> +    }

We're trying to reduce the amount of prints directly from libnotmuch,
not increase. This applies here as well as below.

> +};
> +
> +#if HAVE_XAPIAN_COMPACT
> +notmuch_status_t
> +notmuch_database_compact_close (notmuch_database_t *notmuch)
> +{
> +    void *local = talloc_new (NULL);
> +    NotmuchCompactor compactor;
> +    char *notmuch_path, *xapian_path, *compact_xapian_path, *old_xapian_path;
> +    notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
> +
> +    if (! (notmuch_path = talloc_asprintf (local, "%s/%s", notmuch->path, ".notmuch"))) {
> +	ret = NOTMUCH_STATUS_OUT_OF_MEMORY;
> +	goto DONE;
> +    }
> +
> +    if (! (xapian_path = talloc_asprintf (local, "%s/%s", notmuch_path, "xapian"))) {
> +	ret = NOTMUCH_STATUS_OUT_OF_MEMORY;
> +	goto DONE;
> +    }
> +
> +    if (! (compact_xapian_path = talloc_asprintf (local, "%s.compact", xapian_path))) {
> +	ret = NOTMUCH_STATUS_OUT_OF_MEMORY;
> +	goto DONE;
> +    }
> +
> +    if (! (old_xapian_path = talloc_asprintf (local, "%s.old", xapian_path))) {
> +	ret = NOTMUCH_STATUS_OUT_OF_MEMORY;
> +	goto DONE;
> +    }
> +
> +    try {
> +	compactor.set_renumber(false);
> +	compactor.add_source(xapian_path);
> +	compactor.set_destdir(compact_xapian_path);
> +	compactor.compact();
> +
> +	if (rename(xapian_path, old_xapian_path)) {
> +	    fprintf (stderr, "Error moving old database out of the way\n");
> +	    ret = NOTMUCH_STATUS_FILE_ERROR;
> +	    goto DONE;
> +	}

This fails if old_xapian_path exists.

> +
> +	if (rename(compact_xapian_path, xapian_path)) {
> +	    fprintf (stderr, "Error moving compacted database\n");
> +	    ret = NOTMUCH_STATUS_FILE_ERROR;
> +	    goto DONE;
> +	}
> +    } catch (Xapian::InvalidArgumentError e) {
> +	fprintf (stderr, "Error while compacting: %s", e.get_msg().c_str());
> +	ret = NOTMUCH_STATUS_XAPIAN_EXCEPTION;
> +	goto DONE;
> +    }
> +
> +    fprintf (stderr, "\n");
> +    fprintf (stderr, "\n");
> +    fprintf (stderr, "Old database has been moved to %s", old_xapian_path);
> +    fprintf (stderr, "\n");
> +    fprintf (stderr, "To delete run,\n");
> +    fprintf (stderr, "\n");
> +    fprintf (stderr, "    rm -R %s\n", old_xapian_path);
> +    fprintf (stderr, "\n");
> +
> +    notmuch_database_close(notmuch);
> +
> +DONE:
> +    talloc_free(local);

The database does not get closed on errors. If that's intentional, it
should be documented.

> +    return ret;
> +}
> +#else
> +notmuch_status_t
> +notmuch_database_compact_close (unused (notmuch_database_t *notmuch))
> +{
> +    fprintf (stderr, "notmuch was compiled against a xapian version lacking compaction support.\n");
> +    return NOTMUCH_STATUS_UNSUPPORTED_OPERATION;
> +}
> +#endif
> +
>  void
>  notmuch_database_destroy (notmuch_database_t *notmuch)
>  {
> diff --git a/lib/notmuch.h b/lib/notmuch.h
> index 998a4ae..e9abd90 100644
> --- a/lib/notmuch.h
> +++ b/lib/notmuch.h
> @@ -101,6 +101,7 @@ typedef enum _notmuch_status {
>      NOTMUCH_STATUS_TAG_TOO_LONG,
>      NOTMUCH_STATUS_UNBALANCED_FREEZE_THAW,
>      NOTMUCH_STATUS_UNBALANCED_ATOMIC,
> +    NOTMUCH_STATUS_UNSUPPORTED_OPERATION,
>  
>      NOTMUCH_STATUS_LAST_STATUS
>  } notmuch_status_t;
> @@ -215,6 +216,20 @@ notmuch_database_open (const char *path,
>  void
>  notmuch_database_close (notmuch_database_t *database);
>  
> +/* Close the given notmuch database and then compact it.

The implementation first compacts then closes.

> + * After notmuch_database_close_compact has been called, calls to
> + * other functions on objects derived from this database may either
> + * behave as if the database had not been closed (e.g., if the
> + * required data has been cached) or may fail with a
> + * NOTMUCH_STATUS_XAPIAN_EXCEPTION.
> + *
> + * notmuch_database_close_compact can be called multiple times.  Later
> + * calls have no effect.

This is not true. The Xapian compactor does not require the database to
be open. It will happily open the database read-only and compact the
database again if database has been closed.

> + */
> +notmuch_status_t
> +notmuch_database_compact_close (notmuch_database_t *notmuch);

I'm afraid we really need to re-think the API.

I see that your CLI 'notmuch compact' command opens the database
read-write, I assume to ensure there are no other writers, so that stuff
doesn't get lost. However if you pass a read-write database that has
been modified, I believe the changes will get lost (as Xapian opens the
database read-only). We shouldn't let the API users shoot themselves in
the foot so easily.

I think I'd go for something like:

notmuch_status_t
notmuch_database_compact (const char *path);

or

notmuch_status_t
notmuch_database_compact (const char *path, const char *backup);

which would internally open the database as read-write to ensure no
modifications, compact, and close. If backup != NULL, it would save the
old database there (same mounted file system as the database is a fine
limitation), otherwise remove.

Even then I'm not completely sure what Xapian WritableDatabase will do
on close when the database has been switched underneath it. But moving
the database after it has been closed has a race condition too.


BR,
Jani.



> +
>  /* Destroy the notmuch database, closing it if necessary and freeing
>  * all associated resources. */
>  void
> -- 
> 1.8.1.2
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

-- 
Jani

Thread: