Re: [PATCH v3 0/6] Make Emacs search use sexp format

Subject: Re: [PATCH v3 0/6] Make Emacs search use sexp format

Date: Tue, 04 Jun 2013 01:33:30 -0400

To: Jameson Graef Rollins, notmuch@notmuchmail.org

Cc: tomi.ollila@iki.fi

From: Austin Clements


On Sun, 02 Jun 2013, Jameson Graef Rollins <jrollins@finestructure.net> wrote:
> On Sat, Jun 01 2013, David Bremner <david@tethera.net> wrote:
>> Austin Clements <amdragon@MIT.EDU> writes:
>>
>>> This is v3 of id:1369934016-22308-1-git-send-email-amdragon@mit.edu.
>>> This tweaks the shell invocation as suggested by Tomi and fixes two
>>> comment typos pointed out by Mark.  It also adds a NEWS patch.  I'm
>>> going to go ahead and mark this ready because of Tomi's and Mark's
>>> reviews of v2.
>>
>> The first 5 I pushed. The NEWS patch has a conflict.
>
> I'm very happy to see the long-coming sexp handling working here.  Good
> work, folks, particularly to Austin for getting the awesome asynchronous
> processing stuff working.  Searches are now definitely noticeably
> faster.
>
> I am, however, seeing a couple of issues that we might want to address.
>
> * Killing a search buffer that is still in the process of being filled
>   causes errors to be thrown.  I'm seeing both of the following
>   intermittently:
>
> [Sun Jun  2 08:26:40 2013]
> notmuch exited with status killed
> command: notmuch search --format\=sexp --format-version\=1 --sort\=newest-first to\:jrollins
> exit signal: killed
>
> [Sun Jun  2 08:32:26 2013]
> notmuch exited with status hangup
> command: notmuch search --format\=sexp --format-version\=1 --sort\=newest-first to\:jrollins
> exit signal: hangup
>
>   This is somewhat understandable, as the notmuch binary exits with an
>   error if it hasn't finished dumping the output, but given how common
>   this particular scenario is I think we should try to avoid throwing
>   errors in this circumstance.  I wonder if we shouldn't just modify the
>   binary to not return non-zero if it was manually killed while
>   processing the output, or at least special-case the particular error
>   caused by manually killing the search.

Your assessment is correct, of course.  The right place to fix this is
in Emacs, not the CLI (the CLI *can't* do anything about this, since it
gets killed by a signal).  Probably we should do something different in
the sentinel if the search process's buffer is no longer live.  Clearly
we should suppress the status error for the signal, but I think we still
should report anything that appeared in err-file because it may be
relevant to why the user killed the buffer (e.g., maybe a notmuch
wrapper was blocked on something).

> * The next thing I'm seeing is this:
>
> Opening input file: no such file or directory, /home/jrollins/tmp/nmerr5390CAY
>
>   I'm not exactly sure what causes this error, but it looks to me like
>   the temporary error file was removed before we were finished with it.

This one's pretty awesome (and I think is a bug in Emacs).  At a high
level, the sentinel is getting run twice and since the first call
deletes the error file, the second call fails.  At a low level, what
causes this is fascinating.

1) You kill the search buffer.  This invokes kill_buffer_processes,
   which sends a SIGHUP to notmuch, but doesn't do anything else.
   Meanwhile, the notmuch search process has printed some more output,
   but Emacs hasn't consumed it yet (this is critical).

2) Emacs gets a SIGCHLD from the dying notmuch process, which invokes
   handle_child_signal, which sets the new process status, but can't do
   anything else because it's a signal handler.

3) Emacs returns to its idle loop, which calls status_notify, which sees
   that the notmuch process has a new status.  This is where things get
   interesting.

3.1) Emacs guarantees that it will run process filters on any unconsumed
     output before running the process sentinel, so status_notify calls
     read_process_output, which consumes the final output and calls
     notmuch-search-process-filter.

3.1.1) notmuch-search-process-filter contains code to check if the
       search buffer is still alive and, since it's not, it calls
       delete-process.

3.1.1.1) delete-process correctly sees that the process is already dead
         and doesn't try to send another signal, *but* it still modifies
         the status to "killed".  To deal with the new status, it calls
         status_notify.  Dun dun dun.  We've seen this function before.

3.1.1.1.1) The *recursive* status_notify invocation sees that the
           process has a new status and doesn't have any more output to
           consume, so it invokes our sentinel and returns.

3.2) The outer status_notify call (which we're still in) is now done
     flushing pending process output, so it *also* invokes our sentinel.

It might be that the answer is to just remove the delete-process call
from the filter.  It seems completely redundant (and racy) with Emacs'
automatic SIGHUP'ing.

> * Finally, something happened that caused *12,000* of the following lines
>   to be sent to the *Notmuch errors* buffer:
>
> A Xapian exception occurred performing query: The revision being read has been discarded - you should call Xapian::Database::reopen() and retry the operation
>
>   Again, this was related to killing a search buffer that was still
>   being filled. I'm pretty sure the database was not modified during
>   this process.

I have no insight on this one.  My best guess is that this has nothing
to do with this change except that this change makes these warnings
visible rather than burying them somewhere down in the search results
buffer.

> Let me know if I can help provide any more info.
>
> jamie.

Thread: