Re: [PATCH v3 0/6] Make Emacs search use sexp format

Subject: Re: [PATCH v3 0/6] Make Emacs search use sexp format

Date: Wed, 05 Jun 2013 08:21:59 -0700

To: Austin Clements, notmuch@notmuchmail.org

Cc: tomi.ollila@iki.fi

From: Jameson Graef Rollins


On Mon, Jun 03 2013, Austin Clements <amdragon@MIT.EDU> wrote:
>> * Killing a search buffer that is still in the process of being filled
>>   causes errors to be thrown.  I'm seeing both of the following
>>   intermittently:
>>
>> [Sun Jun  2 08:26:40 2013]
>> notmuch exited with status killed
>> command: notmuch search --format\=sexp --format-version\=1 --sort\=newest-first to\:jrollins
>> exit signal: killed
>>
>> [Sun Jun  2 08:32:26 2013]
>> notmuch exited with status hangup
>> command: notmuch search --format\=sexp --format-version\=1 --sort\=newest-first to\:jrollins
>> exit signal: hangup
>>
>>   This is somewhat understandable, as the notmuch binary exits with an
>>   error if it hasn't finished dumping the output, but given how common
>>   this particular scenario is I think we should try to avoid throwing
>>   errors in this circumstance.  I wonder if we shouldn't just modify the
>>   binary to not return non-zero if it was manually killed while
>>   processing the output, or at least special-case the particular error
>>   caused by manually killing the search.
>
> Your assessment is correct, of course.  The right place to fix this is
> in Emacs, not the CLI (the CLI *can't* do anything about this, since it
> gets killed by a signal).  Probably we should do something different in
> the sentinel if the search process's buffer is no longer live.  Clearly
> we should suppress the status error for the signal, but I think we still
> should report anything that appeared in err-file because it may be
> relevant to why the user killed the buffer (e.g., maybe a notmuch
> wrapper was blocked on something).

That seems like a reasonable approach to me, to suppress the error but
continue to report in *Notmuch errors* buffer.

>> * The next thing I'm seeing is this:
>>
>> Opening input file: no such file or directory, /home/jrollins/tmp/nmerr5390CAY
>>
>>   I'm not exactly sure what causes this error, but it looks to me like
>>   the temporary error file was removed before we were finished with it.
>
> This one's pretty awesome (and I think is a bug in Emacs).  At a high
> level, the sentinel is getting run twice and since the first call
> deletes the error file, the second call fails.  At a low level, what
> causes this is fascinating.
>
> 1) You kill the search buffer.  This invokes kill_buffer_processes,
>    which sends a SIGHUP to notmuch, but doesn't do anything else.
>    Meanwhile, the notmuch search process has printed some more output,
>    but Emacs hasn't consumed it yet (this is critical).
>
> 2) Emacs gets a SIGCHLD from the dying notmuch process, which invokes
>    handle_child_signal, which sets the new process status, but can't do
>    anything else because it's a signal handler.
>
> 3) Emacs returns to its idle loop, which calls status_notify, which sees
>    that the notmuch process has a new status.  This is where things get
>    interesting.
>
> 3.1) Emacs guarantees that it will run process filters on any unconsumed
>      output before running the process sentinel, so status_notify calls
>      read_process_output, which consumes the final output and calls
>      notmuch-search-process-filter.
>
> 3.1.1) notmuch-search-process-filter contains code to check if the
>        search buffer is still alive and, since it's not, it calls
>        delete-process.
>
> 3.1.1.1) delete-process correctly sees that the process is already dead
>          and doesn't try to send another signal, *but* it still modifies
>          the status to "killed".  To deal with the new status, it calls
>          status_notify.  Dun dun dun.  We've seen this function before.
>
> 3.1.1.1.1) The *recursive* status_notify invocation sees that the
>            process has a new status and doesn't have any more output to
>            consume, so it invokes our sentinel and returns.
>
> 3.2) The outer status_notify call (which we're still in) is now done
>      flushing pending process output, so it *also* invokes our sentinel.
>
> It might be that the answer is to just remove the delete-process call
> from the filter.  It seems completely redundant (and racy) with Emacs'
> automatic SIGHUP'ing.

Wow, awesome detective work.  As mentioned on IRC, this suggestion of
Austin's does seem to fix the problem:

diff --git a/emacs/notmuch.el b/emacs/notmuch.el
index 5a8c957..975ef2b 100644
--- a/emacs/notmuch.el
+++ b/emacs/notmuch.el
@@ -817,7 +817,7 @@ non-authors is found, assume that all of the authors match."
        (inhibit-read-only t)
        done)
     (if (not (buffer-live-p results-buf))
-       (delete-process proc)
+       t
       (with-current-buffer parse-buf
        ;; Insert new data
        (save-excursion


I'm not sure if this is the ultimate solution, but it does cause the
missing tmp file errors to go away.

>> * Finally, something happened that caused *12,000* of the following lines
>>   to be sent to the *Notmuch errors* buffer:
>>
>> A Xapian exception occurred performing query: The revision being read has been discarded - you should call Xapian::Database::reopen() and retry the operation
>>
>>   Again, this was related to killing a search buffer that was still
>>   being filled. I'm pretty sure the database was not modified during
>>   this process.
>
> I have no insight on this one.  My best guess is that this has nothing
> to do with this change except that this change makes these warnings
> visible rather than burying them somewhere down in the search results
> buffer.

Yeah, I suspected as much as well.

jamie.
part-000.sig (application/pgp-signature)

Thread: