Re: Notmuch DB Problems

Subject: Re: Notmuch DB Problems

Date: Fri, 07 Sep 2018 08:32:03 -0700

To: Jani Nikula,


From: Mueen Nawaz

Jani Nikula <> writes:

> It might be interesting to see an strace log to possibly get an idea
> where it gets stuck.
> Is the filesystem writable and working okay?
> If search and show work, I'm guessing it gets stuck in trying to open
> the database writable. One hackish idea is to patch notmuch dump to open
> the database in read-only mode, and dump the tags. See below. The dump
> command opens the database writable to prevent changes while
> dumping. (Arguably this could be a command line option for cases like
> yours.)

Thanks - your patch worked. I dumped all the tags, deleted the database,
rebuilt it and restored the tags. All was well.

Until the following day at noon I noticed the problem was back. By
evening, I could not even do queries - it wouldn't open even in read
only mode. The database was dead.

After a lot of poking around, I figured out the problem, and this may be
of interest to the developers (although not sure if it is a xapian issue
or a notmuch issue).

Here's why it would freeze:

I have a post-new hook that runs a Python script. Depending on whether
the new email it is processing matches a rule I have, it will fire off
an email to the sender using the SMTP library in Python.

I had recently upgraded my MTA (PostFix), and it had a backward
incompatible change that broke my config. I don't know why, but I could
still send emails via Emacs, but when I tried to send them via Python,
Postfix would log an error and it would not send. The Python statement
would freeze (I guess Postfix doesn't return an appropriate response?
Not sure why). 

I have a cron job to run "notmuch new" 3 times an hour. Since the hook
was frozen, so was the notmuch new command. I had quite a lot of
"notmuch new" processes. I assume this meant the DB was locked all this
time for writing.

Now killing all those jobs did not fix the database. It was still
broken. And as we saw the second time round, it was /really/ broken - it
would not even open in read-only mode.

It is scary that if a post-new hook freezes while the database is
locked, it could (eventually) clobber the database. I don't know if
notmuch can do anything to prevent this outcome?

BTW, I think the DB would die only after a while. In my experiments, if
I killed the hook soon (e.g. under 1 minute), the database seemed fine. 

Don't use a big word where a diminutive one will suffice.

                    /\  /\               /\  /
                   /  \/  \ u e e n     /  \/  a w a z

notmuch mailing list