Re: DRAFT Introduce CFFI-based Python bindings

Subject: Re: DRAFT Introduce CFFI-based Python bindings

Date: Wed, 29 Nov 2017 22:26:05 +0100

To: notmuch@notmuchmail.org

Cc:

From: Floris Bruynooghe


Patrick Totzke <patricktotzke@gmail.com> writes:

> Quoting David Bremner (2017-11-28 23:59:26)
>> Floris Bruynooghe <flub@devork.be> writes:
>> 
>> >
>> > Lastly there are some downsides to the choices I made:
>> > - I ended up going squarely for CPython 3.6+.  Choosing Python
>> >   3 allowed better API design, e.g. with keyword-only parameters
>> >   etc.  Choosing CPython 3.4+ restricts the madness that can
>> >   happen with __del__ and gives some newer (tho now unused)
>> >   features in weakref.finalizer.
>> > - This is no longer drop-in compatible.
>> > - I haven't got to a stage where my initial goal of speed has
>> >   been proven yet.
>> 
>> I guess you'll have to convince the maintainers / users of alot and afew
>> that this makes sense before we go much further. I'd point out that
>> Debian stable is only at python 3.5, so that makes me a bit wary of this
>> (being able to run the test suite on debian stable and similar aged
>> distros useful for me, and I suspect other developers).
>> 
>> I know there are issues with memory management in the current bindings,
>> so that may be a strong reason to push to python 3.6; it seems to need
>> more investigation at the moment.
>> 
>> d
>
>
> I am generally in favour of modernizing the notmuch python bindings,
> especially when it comes to memory management and exception handling.
>
> At the moment, the alot interface officially only supports python v2.7
> but our dependencies have now mostly been updated and we are working on
> port to python 3, see here: https://github.com/pazz/alot/pull/1055
>
> @Floris, you are welcome to join #alot on freenode if you want to
> discuss details on that.
>
> You mention that your new API breaks compatibility with the existing
> ones. Do you have some demo code that uses the new API for reference?

Short, untested, example which works with what's posted:

db = notdb.Database.create()
# or
db = notdb.Database(path=None, mode=notdb.Database.MODE.READ_WRITE)
print(db.path) -> pathlib.Path (a py34 dependency)
if 'unread' in db.tags:  # tags behaves like a set
    print('unread mail!')
with db.atomic():
    msg = db.add('/path/to/file')
    mdg = db.get('/path/to/file')
    msg = db.find('some-msgid')
    db.remove('path/to/other/file')
# sorry, don't have a query interface yet
assert 'unread' in msg.tags
for tag in msg.tags:
    print(f'a tag: {tag}')
with msg.frozen():  # Message.frozen() not yet implemented
    msg.tags.clear()  # all set operations supported
    msg.tags.add('atag')
msg.tags_to_flags()

I imagine the query interface would be something like:

with db.query('tag:atag') as query:
    print(f'results: {query.count}')
    for msg in query:
        print(msg.path)

But to be honest I've been spending most time on getting the
memory-safety figured (which I hope I finally did) so far and I think
the tag handling is so far the nicest thing to show off.  They're
completely normal Python sets with no special behaviour at all (well,
that's not true - there's the binary interface, see the code posted for
this).

Actually, this last point is kind of important and I failed to mention
it before too.  The existing Python bindings convert many bytes from
libnotmuch to Python strings, that is unicode on Python 3.  For many it
uses b'bytes'.decode('utf-8', errors='ignore') which is a sane default
if you want to display things.  But if you need to round-trip a tag and
store it again you might be changing the tag.  I've not found the right
way to handle binary data (e.g. also needed for messageid) everywhere
yet but for tags I've gone with:

for tag in iter(msg.tags):  # iter() normally called implicitly by for loop
    print(f'All unicode this: {tag}')
for tag in msg.tags.iter(encoding=None):
    other_msg.tags.add(tag)  # This passes pure bytes around, loses nothing
# What I've used for message ID for now is a "BinString" type
print(f'All just unicode: {msg.messageid}')
binary_msgid = bytes(msg.messageid)  # No lossy conversion

This BinString stuff is somewhat hacky, not sure how sane that is.  The
second iterator on tags feels somewhat cleaner.  Likewise tags could be
BinString as well instead of plain str.

Cheers,
Floris
_______________________________________________
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch

Thread: