Jani Nikula <jani@nikula.org> writes: > On Sat, 24 Aug 2013, Ben Gamari <bgamari.foss@gmail.com> wrote: >> This function uses Xapian's Compactor machinery to compact the notmuch >> database. The compacted database is built in a temporary directory and >> later moved into place while the original uncompacted database is >> preserved. >> snip >> >> +class NotmuchCompactor : public Xapian::Compactor >> +{ >> +public: >> + virtual void >> + set_status (const std::string &table, const std::string &status) >> + { >> + if (status.length() == 0) >> + printf ("compacting table %s:\n", table.c_str()); >> + else >> + printf (" %s\n", status.c_str()); >> + } > > We're trying to reduce the amount of prints directly from libnotmuch, > not increase. This applies here as well as below. > Fair enough. That being said, I think that status updates are fairly important given that the compaction process can be rather long. Would the preferred interface be to provide notmuch_database_compact_close with a progress callback? >> +}; >> + >> +#if HAVE_XAPIAN_COMPACT >> +notmuch_status_t >> +notmuch_database_compact_close (notmuch_database_t *notmuch) >> +{ >> + void *local = talloc_new (NULL); >> + NotmuchCompactor compactor; >> + char *notmuch_path, *xapian_path, *compact_xapian_path, *old_xapian_path; >> + notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; >> + >> + if (! (notmuch_path = talloc_asprintf (local, "%s/%s", notmuch->path, ".notmuch"))) { >> + ret = NOTMUCH_STATUS_OUT_OF_MEMORY; >> + goto DONE; >> + } >> + >> + if (! (xapian_path = talloc_asprintf (local, "%s/%s", notmuch_path, "xapian"))) { >> + ret = NOTMUCH_STATUS_OUT_OF_MEMORY; >> + goto DONE; >> + } >> + >> + if (! (compact_xapian_path = talloc_asprintf (local, "%s.compact", xapian_path))) { >> + ret = NOTMUCH_STATUS_OUT_OF_MEMORY; >> + goto DONE; >> + } >> + >> + if (! (old_xapian_path = talloc_asprintf (local, "%s.old", xapian_path))) { >> + ret = NOTMUCH_STATUS_OUT_OF_MEMORY; >> + goto DONE; >> + } >> + >> + try { >> + compactor.set_renumber(false); >> + compactor.add_source(xapian_path); >> + compactor.set_destdir(compact_xapian_path); >> + compactor.compact(); >> + >> + if (rename(xapian_path, old_xapian_path)) { >> + fprintf (stderr, "Error moving old database out of the way\n"); >> + ret = NOTMUCH_STATUS_FILE_ERROR; >> + goto DONE; >> + } > > This fails if old_xapian_path exists. > Ouch, yes, you are right. I suspect the right way forward here will be to check whether old_xapian_path exists before beginning compaction, allowing the user to fix the situation before it fails after finishing what might be a pretty long process. >> + >> + if (rename(compact_xapian_path, xapian_path)) { >> + fprintf (stderr, "Error moving compacted database\n"); >> + ret = NOTMUCH_STATUS_FILE_ERROR; >> + goto DONE; >> + } >> + } catch (Xapian::InvalidArgumentError e) { >> + fprintf (stderr, "Error while compacting: %s", e.get_msg().c_str()); >> + ret = NOTMUCH_STATUS_XAPIAN_EXCEPTION; >> + goto DONE; >> + } >> + >> + fprintf (stderr, "\n"); >> + fprintf (stderr, "\n"); >> + fprintf (stderr, "Old database has been moved to %s", old_xapian_path); >> + fprintf (stderr, "\n"); >> + fprintf (stderr, "To delete run,\n"); >> + fprintf (stderr, "\n"); >> + fprintf (stderr, " rm -R %s\n", old_xapian_path); >> + fprintf (stderr, "\n"); >> + >> + notmuch_database_close(notmuch); >> + >> +DONE: >> + talloc_free(local); > > The database does not get closed on errors. If that's intentional, it > should be documented. > I had reasons for this but they have long fled my memory. Regardless of what it does, this behavior should be documented. I'll take care of this. >> + return ret; >> +} >> +#else >> +notmuch_status_t >> +notmuch_database_compact_close (unused (notmuch_database_t *notmuch)) >> +{ >> + fprintf (stderr, "notmuch was compiled against a xapian version lacking compaction support.\n"); >> + return NOTMUCH_STATUS_UNSUPPORTED_OPERATION; >> +} >> +#endif >> + >> void >> notmuch_database_destroy (notmuch_database_t *notmuch) >> { >> diff --git a/lib/notmuch.h b/lib/notmuch.h >> index 998a4ae..e9abd90 100644 >> --- a/lib/notmuch.h >> +++ b/lib/notmuch.h >> @@ -101,6 +101,7 @@ typedef enum _notmuch_status { >> NOTMUCH_STATUS_TAG_TOO_LONG, >> NOTMUCH_STATUS_UNBALANCED_FREEZE_THAW, >> NOTMUCH_STATUS_UNBALANCED_ATOMIC, >> + NOTMUCH_STATUS_UNSUPPORTED_OPERATION, >> >> NOTMUCH_STATUS_LAST_STATUS >> } notmuch_status_t; >> @@ -215,6 +216,20 @@ notmuch_database_open (const char *path, >> void >> notmuch_database_close (notmuch_database_t *database); >> >> +/* Close the given notmuch database and then compact it. > > The implementation first compacts then closes. > >> + * After notmuch_database_close_compact has been called, calls to >> + * other functions on objects derived from this database may either >> + * behave as if the database had not been closed (e.g., if the >> + * required data has been cached) or may fail with a >> + * NOTMUCH_STATUS_XAPIAN_EXCEPTION. >> + * >> + * notmuch_database_close_compact can be called multiple times. Later >> + * calls have no effect. > > This is not true. The Xapian compactor does not require the database to > be open. It will happily open the database read-only and compact the > database again if database has been closed. > >> + */ >> +notmuch_status_t >> +notmuch_database_compact_close (notmuch_database_t *notmuch); > > I'm afraid we really need to re-think the API. > It seems you are right. When writing this interface it was clear that there would be a number of opportunities for misuse. I was hoping by combining compact and close some of these would be eliminated but clearly this isn't enough. > I see that your CLI 'notmuch compact' command opens the database > read-write, I assume to ensure there are no other writers, so that stuff > doesn't get lost. However if you pass a read-write database that has > been modified, I believe the changes will get lost (as Xapian opens the > database read-only). We shouldn't let the API users shoot themselves in > the foot so easily. > That is correct; the read-write database was an attempt to force the user to exclusively lock the database they are trying to compact. It seems that things can go quite wrong[1] when a database is modified during compaction. There was a suggestion in that thread to add an option to lock the database during compaction. Perhaps it might be worth bringing this up again with Xapian upstream. I think we agree that it would be a poor idea to merge compaction functionality without having a mechanism for ensuring data integrity, especially since many users invoke notmuch in a cron job. > I think I'd go for something like: > > notmuch_status_t > notmuch_database_compact (const char *path); > > or > > notmuch_status_t > notmuch_database_compact (const char *path, const char *backup); > > which would internally open the database as read-write to ensure no > modifications, compact, and close. If backup != NULL, it would save the > old database there (same mounted file system as the database is a fine > limitation), otherwise remove. > This sounds fine to me. > Even then I'm not completely sure what Xapian WritableDatabase will do > on close when the database has been switched underneath it. But moving > the database after it has been closed has a race condition too. > Good points. Not sure what the least evil way about this is. Hopefully Xapian's close operation really does just close file handles. Cheers, - Ben [1] http://lists.xapian.org/pipermail/xapian-discuss/2011-July/008310.html