T350-crypto T357-index-decryption: possible race condition?

Subject: T350-crypto T357-index-decryption: possible race condition?

Date: Thu, 11 May 2023 14:59:03 +0200

To: notmuch@notmuchmail.org

Cc:

From: Michael J Gruber


Hi there,

my regular notmuch test builds recently started to fail, more
concretely: the test suite fails because some subtests are KILLed.
Building notmuch 0.37 with my usual spec-file as a rawhide-mock build
(a local chroot for the development "version" of Fedora which will
become Fedora 39) I see:
```
T350-crypto: Testing PGP/MIME signature verification and decryption
 PASS   emacs delivery of signed message via fcc
 PASS   emacs delivery of signed message via fcc and smtp
 PASS   signed part content-type indexing
 PASS   signature verification
 PASS   detection of modified signed contents
 PASS   corrupted pgp/mime signature
 PASS   signature verification without full user ID validity
 PASS   signature verification with signer key unavailable
```
There the suite "hangs" for about 2 minutes, followed by
```
FATAL: /builddir/build/BUILD/notmuch-0.37/test/T350-crypto.sh:
interrupted by signal 15
```
It proceeds until
```
T357-index-decryption: Testing indexing decrypted mail
```
and hangs again for about 2 minutes, followed by
```
FATAL: /builddir/build/BUILD/notmuch-0.37/test/T357-index-decryption.sh:
interrupted by signal 15
```
In the end, the suite complains:
```
'/builddir/build/BUILD/notmuch-0.37/test/test-results/T350-crypto'
does not exist!
'/builddir/build/BUILD/notmuch-0.37/test/test-results/T357-index-decryption'
does not exist!
```
At least for T350 this is strange because several subtests ran and
passed! This indicates a race or a wrong signal trap.

The same problem happens with notmuch 0.37 in Fedora's infrastructure
(koji rawhide, e.g.
https://koji.fedoraproject.org/koji/taskinfo?taskID=101014703).

Curiously, everything seems to work with notmuch 0.37 in Fedora 38,
which is the current release, in both koji and locally in mock.

BUT: In Fedora's secondary test-bed (copr) and with notmuch from git,
these kind of errors happen on released fedora versions, too. This was
kind of erratic, but I suspected something related to emacs 28 and
test timeouts. So I increased the timeout in the test lisp lib (see
below), hoping for the better, but getting the worse, at least
deterministically worse: With this change, the test suite fails
reliably (the two mentioned above plus T315-emacs-tagging) on all
Fedoras (with Emacs 28) and passes on epel (with Emacs 27), see for
example:
https://copr.fedorainfracloud.org/coprs/mjg/notmuch-git/build/5908525/

Now, emacs is not the only difference, and the complete test result
directory disappearing is still strange, and really all that is
strange. Help please ;)

(There is another problem related to Python 3.12 which I'll address
separately - rawhide still carries 3.11.)

```
---
 test/test-lib.el | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/test/test-lib.el b/test/test-lib.el
index 79a9d4d6..39ade9b9 100644
--- a/test/test-lib.el
+++ b/test/test-lib.el
@@ -39,7 +39,7 @@
 (defun notmuch-test-wait ()
   "Wait for process completion."
   (while (get-buffer-process (current-buffer))
-    (accept-process-output nil 0.1)))
+    (accept-process-output nil 120)))

 (defun test-output (&optional filename)
   "Save current buffer to file FILENAME.  Default FILENAME is OUTPUT."
```
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Thread: