Re: Bug#842291: notmuch processes frequently stuck in select()

Subject: Re: Bug#842291: notmuch processes frequently stuck in select()

Date: Wed, 23 Nov 2016 12:19:05 -0500

To: David Bremner, Brian May, 842291@bugs.debian.org, Robbie Harwood

Cc: notmuch@notmuchmail.org, Debian GnuPG packaging

From: Daniel Kahn Gillmor


Control: affects 842291 + gpgsm dirmngr

On Wed 2016-11-23 03:50:40 -0500, David Bremner wrote:
> David Bremner <david@tethera.net> writes:
>
>> Brian May <bam@debian.org> writes:
>>> strace shows notmuch looping in select.
>>>
>>> select(10, [9], [], NULL, {1, 0})       = 0 (Timeout)
>>> select(10, [9], [], NULL, {1, 0})       = 0 (Timeout)
>>> select(10, [9], [], NULL, {1, 0})       = 0 (Timeout)
>>> select(10, [9], [], NULL, {1, 0})       = 0 (Timeout)
>>> etc
>>>
>>
>> a backtrace would be helpful.
>>
>> d
>
> Nevermind, I managed to download the list archive for debian-devel and
> replicate the bug.
>
> The bug seems to be related to smime signature verification. After
> adding the attached mail message (and running notmuch-new), to replicate
> the hang it suffices to run
>
> % notmuch show --decrypt id:20161116T143809.GA.21721.stse@fsing.rootsland.net  
>
> As far as workarounds, turning off decryption / signature verification
> should allow you to at least view the thread.

I've noticed similar behavior, and it seems to correlate with gpgsm
asking dirmngr for an update to CRLs related to S/MIME certs.

some CRLs simply do not respond at all (resulting in a timeout), and
some do not respond, or are laggy, when accessed over tor specifically.

I see a couple possible ways to consider resolving this, none of them
great, and i don't know exactly how to implement any of them:

 0) turn off CRL updates entirely during s/mime signature verification

 1) do s/mime signature verification without CRL updates, but schedule
    CRL checks to happen in the background for dirmngr, so that future
    verifications will reflect the cert validity

 2) have dirmngr avoid checking CRLs that it knows it has already
    updated recently

 3) tell dirmngr to use much shorter CRL fetch timeouts


Some example traffic from my dirmngr that uses tor (complete with
timestamps indicating just how bad the delays can be):

Nov 22 14:08:24 alice dirmngr[11976]: no CRL available for issuer id 770B4DA5913F2572B9F679AE0819FB7D77572689
Nov 22 14:08:24 alice dirmngr[11976]: fetching CRL from 'http://crl.ll.mit.edu/getcrl/LLCA3'
Nov 22 14:08:44 alice dirmngr[11976]: resolving 'crl.ll.mit.edu' failed: No data
Nov 22 14:08:44 alice dirmngr[11976]: can't connect to 'crl.ll.mit.edu': host not found
Nov 22 14:08:44 alice dirmngr[11976]: error retrieving 'http://crl.ll.mit.edu/getcrl/LLCA3': Unknown host
Nov 22 14:08:44 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host
Nov 22 14:08:45 alice dirmngr[11976]: no CRL available for issuer id 770B4DA5913F2572B9F679AE0819FB7D77572689
Nov 22 14:08:45 alice dirmngr[11976]: fetching CRL from 'http://crl.ll.mit.edu/getcrl/LLCA3'
Nov 22 14:09:05 alice dirmngr[11976]: resolving 'crl.ll.mit.edu' failed: No data
Nov 22 14:09:05 alice dirmngr[11976]: can't connect to 'crl.ll.mit.edu': host not found
Nov 22 14:09:05 alice dirmngr[11976]: error retrieving 'http://crl.ll.mit.edu/getcrl/LLCA3': Unknown host
Nov 22 14:09:05 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host
Nov 22 14:09:05 alice dirmngr[11976]: no CRL available for issuer id 26FD002905277B015EE9B2A3C092A348F28A4C6B
Nov 22 14:09:05 alice dirmngr[11976]: fetching CRL from 'http://crl.startssl.com/sca-client1.crl'
Nov 22 14:09:25 alice dirmngr[11976]: resolving 'crl.startssl.com' failed: No data
Nov 22 14:09:25 alice dirmngr[11976]: can't connect to 'crl.startssl.com': host not found
Nov 22 14:09:25 alice dirmngr[11976]: error retrieving 'http://crl.startssl.com/sca-client1.crl': Unknown host
Nov 22 14:09:25 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host
Nov 22 14:09:25 alice dirmngr[11976]: no CRL available for issuer id 26FD002905277B015EE9B2A3C092A348F28A4C6B
Nov 22 14:09:25 alice dirmngr[11976]: fetching CRL from 'http://crl.startssl.com/sca-client1.crl'
Nov 22 14:09:45 alice dirmngr[11976]: resolving 'crl.startssl.com' failed: No data
Nov 22 14:09:45 alice dirmngr[11976]: can't connect to 'crl.startssl.com': host not found
Nov 22 14:09:45 alice dirmngr[11976]: error retrieving 'http://crl.startssl.com/sca-client1.crl': Unknown host
Nov 22 14:09:45 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host

that's a 20-second lag between each failed check, adding up to 80
seconds delay in rendering a single thread where 4 messages were signed
by S/MIME keys signed by two different authorities.

Fwiw, crl.ll.mit.edu doesn't seem to respond over tor on port 80 at all
in some cases, and in other cases takes nearly a minute to reply:

0 dkg@alice:/tmp/cdtemp.Ue45bu$ time wget -q 'http://crl.ll.mit.edu/getcrl/LLCA3'

real	0m0.694s
user	0m0.008s
sys	0m0.008s
0 dkg@alice:/tmp/cdtemp.Ue45bu$ time torsocks wget -q 'http://crl.ll.mit.edu/getcrl/LLCA3'

real	0m58.828s
user	0m0.008s
sys	0m0.008s
0 dkg@alice:/tmp/cdtemp.Ue45bu$ 


Any thoughts on the best way to pursue this?

    --dkg
signature.asc (application/pgp-signature)

Thread: