Re: notmuch2 (python cffi bindings) segfault gdb logs

Subject: Re: notmuch2 (python cffi bindings) segfault gdb logs

Date: Wed, 25 Nov 2020 10:35:27 +0000

To: Floris Bruynooghe, notmuch@notmuchmail.org

Cc:

From: Patrick Totzke


Hello Floris, thanks for having a look at this!


Quoting Floris Bruynooghe (2020-11-24 21:31:00)

> Hi Patrick,
> 
> On Mon 23 Nov 2020 at 10:36 +0000, Patrick Totzke wrote:
> > I've been complaining about the new (and old) python bindings causing the python interpreter to segfault occasionally. So far I was not able to reproduce this reliably nor provide error traces. This has just changed:
> > see below and attached for what I got from gdb.
> 
> Your gdb info doesn't say explicitly (or I missed it), but this is
> showing a SEGFAULT I guess?

Yes, correct. I saw this triggered when untagging some messages from my inbox in alot.
I forgot to mention version numbers: 

    notmuch: 0.31+7~g981d5a0
    Python: 3.8.6
    alot: 0.9.1

notmuch and bindings are compiled from git master, on a debian testing system.


> > I hope that whoever is in charge of the bindings can make sense of
> > it. I don't have any experience so far with cffi nor gdb and have a
> > hard time debugging this. The logs below are my attempt to collect as
> > much detail as possible about. Please let me know if I missed
> > something.
> 
> From what I can tell we're calling a function to free something which
> segfaults, so it probably was freed already and we didn't know.  We need
> to find out who freed it before and why we thought it still needed to be
> freed.
 

It may help to know that this only ever happened if i tagged messages while the alot screen did not display the whole query result.
I presume that this means there was some left over reference to an existing query object, which could have been affected by libtalloc.

Alot is reading thread id's from notmuch2.Database.threads() in a generator:
https://github.com/pazz/alot/blob/master/alot/db/manager.py#L314
Could this be problematic? After all, it may continue reading from it after a while.


> > (gdb) info threads
> >   Id   Target Id                                     Frame 
> > * 1    Thread 0x7ffff7c0e740 (LWP 3614451) "python3" __GI_raise (sig=sig@entry=6)
> >     at ../sysdeps/unix/sysv/linux/raise.c:50
> 
> From this I gather we only have one thread, could you confirm this?
> notmuch2 just isn't thread safe at the moment (I forget whether this was
> intentional or by accident, might have been intentional depending on how
> threadsafe libnotmuch is).


Yes, I'm quite -- but not 100% --- sure as I did not write the port to notmuch2 for alot's backend.


> > Traceback (most recent call first):
> >   <built-in method notmuch_thread_destroy of CompiledLib object at remote 0x7ffff636f040>
> >   File "/home/pazz/.local/lib/python3.8/site-packages/notmuch2/_thread.py", line 38, in _destroy
> >     capi.lib.notmuch_thread_destroy(self._thread_p)
> >   File "/home/pazz/.local/lib/python3.8/site-packages/notmuch2/_thread.py", line 34, in __del__
> >     self._destroy()
> >   File "/home/pazz/projects/alot/alot/db/manager.py", line 570, in get_threads
> >   <built-in method next of module object at remote 0x7ffff78b70e0>
> 
> I pulled alot master and this does not match at all.  Could you tell me
> which git ref this was using so I can try and see what alot is actually
> doing?  (or some other way of sharing the source in this backtrace)


This happed on alot master: 7915ea60ba866010abc728851626df96d8b80816 for me.
I should say that I've had this issue long before, even before alot used the new bindings.

Another stab in the dark: Could this be due to concurrent changes to the notmuch index in my mail sync/tagging script?
I am using afew https://github.com/afewmail/afew which is still on the old python bindings as far as I am aware.

Thanks again for your efforts Floris!
P
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Thread: