Re: parallel test failures

Subject: Re: parallel test failures

Date: Thu, 25 Feb 2021 21:33:28 +0200

To: David Bremner, notmuch@notmuchmail.org

Cc:

From: Tomi Ollila


On Fri, Feb 19 2021, David Bremner wrote:

> I have intermittent failures when running the test suite on sufficiently
> parallel machines.  I have attached a log of such a failing build,
> although it does not seem especially illuminating.
>
> It takes anywhere from 5 to 300 runs to get a failure for me running on
> 60 hardware threads (30 cores). At least on this machine the number of
> tests that pass seems consistent at 1205

I did the following changes to see file write accesses:

----
diff --git a/test/notmuch-test b/test/notmuch-test
index b58fd3b3..903a5dff 100755
--- a/test/notmuch-test
+++ b/test/notmuch-test
@@ -62,13 +62,16 @@ if test -z "$NOTMUCH_TEST_SERIALIZE" && command -v
parallel >/dev/null ; then
         META_FAILURE="parallel test suite returned error code $RES"
     fi
 else
+    rm -rf inw; mkdir inw
     for test in $TESTS; do
+        testname=$(basename $test .sh)
+        inotifywait -d --outfile $PWD/inw/inw-$testname -r -e close_write,delete $PWD/test /tmp
         $TEST_TIMEOUT_CMD $test "$@" &
         wait $!
+        pkill inotifywa
         # If the test failed without producing results, then it aborted,
         # so we should abort, too.
         RES=$?
-        testname=$(basename $test .sh)
         if [[ $RES != 0 && ! -e
         "$NOTMUCH_BUILDDIR/test/test-results/$testname" ]]; then
             META_FAILURE="Aborting on $testname (returned $RES)"
             break
----

Then ran tests w/ NOTMUCH_TEST_SERIALIZE=t

and then ran

for f in inw/*; do echo $f; sed -e 's,.*notmuch/test/,  ,' -e '/tmp.T/ s,/.*,,' $f | sort -u; echo; done | less

to examine "fallout"

based on that (random gazes to the listing) I did not see any potentially
overlapping writes, but saw unrelated inconsistency in test directories.

Anyway, the log.gz did not show any tests failing but parallel exiting
nonzero possibly for some other reason. Cannot say. Probably stracing (even
with --seccomp-bpf) would make it happen even less likely :/

Tomi
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Thread: