On Mon, Jun 03 2013, Austin Clements <amdragon@MIT.EDU> wrote: >> * Killing a search buffer that is still in the process of being filled >> causes errors to be thrown. I'm seeing both of the following >> intermittently: >> >> [Sun Jun 2 08:26:40 2013] >> notmuch exited with status killed >> command: notmuch search --format\=sexp --format-version\=1 --sort\=newest-first to\:jrollins >> exit signal: killed >> >> [Sun Jun 2 08:32:26 2013] >> notmuch exited with status hangup >> command: notmuch search --format\=sexp --format-version\=1 --sort\=newest-first to\:jrollins >> exit signal: hangup >> >> This is somewhat understandable, as the notmuch binary exits with an >> error if it hasn't finished dumping the output, but given how common >> this particular scenario is I think we should try to avoid throwing >> errors in this circumstance. I wonder if we shouldn't just modify the >> binary to not return non-zero if it was manually killed while >> processing the output, or at least special-case the particular error >> caused by manually killing the search. > > Your assessment is correct, of course. The right place to fix this is > in Emacs, not the CLI (the CLI *can't* do anything about this, since it > gets killed by a signal). Probably we should do something different in > the sentinel if the search process's buffer is no longer live. Clearly > we should suppress the status error for the signal, but I think we still > should report anything that appeared in err-file because it may be > relevant to why the user killed the buffer (e.g., maybe a notmuch > wrapper was blocked on something). That seems like a reasonable approach to me, to suppress the error but continue to report in *Notmuch errors* buffer. >> * The next thing I'm seeing is this: >> >> Opening input file: no such file or directory, /home/jrollins/tmp/nmerr5390CAY >> >> I'm not exactly sure what causes this error, but it looks to me like >> the temporary error file was removed before we were finished with it. > > This one's pretty awesome (and I think is a bug in Emacs). At a high > level, the sentinel is getting run twice and since the first call > deletes the error file, the second call fails. At a low level, what > causes this is fascinating. > > 1) You kill the search buffer. This invokes kill_buffer_processes, > which sends a SIGHUP to notmuch, but doesn't do anything else. > Meanwhile, the notmuch search process has printed some more output, > but Emacs hasn't consumed it yet (this is critical). > > 2) Emacs gets a SIGCHLD from the dying notmuch process, which invokes > handle_child_signal, which sets the new process status, but can't do > anything else because it's a signal handler. > > 3) Emacs returns to its idle loop, which calls status_notify, which sees > that the notmuch process has a new status. This is where things get > interesting. > > 3.1) Emacs guarantees that it will run process filters on any unconsumed > output before running the process sentinel, so status_notify calls > read_process_output, which consumes the final output and calls > notmuch-search-process-filter. > > 3.1.1) notmuch-search-process-filter contains code to check if the > search buffer is still alive and, since it's not, it calls > delete-process. > > 3.1.1.1) delete-process correctly sees that the process is already dead > and doesn't try to send another signal, *but* it still modifies > the status to "killed". To deal with the new status, it calls > status_notify. Dun dun dun. We've seen this function before. > > 3.1.1.1.1) The *recursive* status_notify invocation sees that the > process has a new status and doesn't have any more output to > consume, so it invokes our sentinel and returns. > > 3.2) The outer status_notify call (which we're still in) is now done > flushing pending process output, so it *also* invokes our sentinel. > > It might be that the answer is to just remove the delete-process call > from the filter. It seems completely redundant (and racy) with Emacs' > automatic SIGHUP'ing. Wow, awesome detective work. As mentioned on IRC, this suggestion of Austin's does seem to fix the problem: diff --git a/emacs/notmuch.el b/emacs/notmuch.el index 5a8c957..975ef2b 100644 --- a/emacs/notmuch.el +++ b/emacs/notmuch.el @@ -817,7 +817,7 @@ non-authors is found, assume that all of the authors match." (inhibit-read-only t) done) (if (not (buffer-live-p results-buf)) - (delete-process proc) + t (with-current-buffer parse-buf ;; Insert new data (save-excursion I'm not sure if this is the ultimate solution, but it does cause the missing tmp file errors to go away. >> * Finally, something happened that caused *12,000* of the following lines >> to be sent to the *Notmuch errors* buffer: >> >> A Xapian exception occurred performing query: The revision being read has been discarded - you should call Xapian::Database::reopen() and retry the operation >> >> Again, this was related to killing a search buffer that was still >> being filled. I'm pretty sure the database was not modified during >> this process. > > I have no insight on this one. My best guess is that this has nothing > to do with this change except that this change makes these warnings > visible rather than burying them somewhere down in the search results > buffer. Yeah, I suspected as much as well. jamie.