Re: [PATCH 1/1] python/notmuch2: provide binding for Database.directory()

Subject: Re: [PATCH 1/1] python/notmuch2: provide binding for Database.directory()

Date: Sat, 31 Jul 2021 12:14:23 +0200

To: Ludovic LANGE, notmuch@notmuchmail.org

Cc:

From: Floris Bruynooghe


Hi Ludovic,

Thanks for following up and your patience waiting for my review.

Also, would you mind following the v2, v3 tagging of patches as
described on https://notmuchmail.org/contributing/#index15h2 I know it's
not straightforward and I also have to look it up every time, but it
does help avoiding confusion.

> In the (new) cffi bindings, the corresponding call was not yet implemented.
> We provide a naive implementation (renamed to Database.directory()), and the
> implementation of the needed (internal) Directory object.
> ---
>
> Hello,
>
> This my second try at updating the python-cffi bindings.
> Thanks to the feedback of @Floris and @Tomi, this new tentative includes most
> of the suggestions made:
> * Database.get_directory(..) renamed to Database.directory()
> * Database.directory() is properly documented
> * Database.directory() accepts, for its path argument, str, bytes or os.PathLike
> * Iterators returning directories or files are now returning Path (not PurePath)
> * Directory.__init__() accepts, for its path argument, str, bytes or os.PathLike
> * Directory class is properly documented
> * Directory.set_mtime() accepts int or float (uses int internally)
> * Directory.get_child_files() renamed to Directory.files()
> * Directory.files() can, optionally, return absolute paths (instead of, by
> 	default, relative paths) (*)
> * Directory.get_child_directories() renamed to Directory.directories()
> * Directory.directories() can, optionally, return absolute paths (instead of, by
> 	default, relative paths) (*)
> * Directory.__repr__ simplified
>
>
> Not (yet ?) implemented:
> * Suggestion to remove the code `if not hasattr(os, 'PathLike') and isinstance(
> path, pathlib.Path`, I'd like to understand if we should do it for all the 6
> other occurences in Database or if they are special cases. May be in another patch ?

Yes this can, and probably should, be done in all occurrences.  I'd just
do it here in this patch and do the others in a separate patch as
that'll make reviewing easier.

> * Choosing betwen getters/setters OR property for Directory.set/get/mtime - some
> more discussion could be helping ?

So I think it's best to stick to what was introduced in
`Database.decrypt_policy` and go with the property, mostly for reasons
of consistency.

> (*) about relative vs absolute and default value:
> My use-case for this whole patch is related to [Alot MUA](https://github.com/pazz/alot)
> and more specifically trying to revive [an old patch](https://github.com/pazz/alot/pull/1170),
> which tries to display the folders in a TreeView.
> The previous python bindings, as well as the C-API, all return relative paths (
> 	relative to the current Directory).
> Having list of a relative paths is a plus for the abovementionned patch,
> as we won't have to iterate on the result and basename() those paths. It'll save
> some (precious) time on a lot of folders.
> That being said, I can see an interest for having a full, workable path (also
> 	without having to convert them).
> So I introduced an optional parameter (defaulting to 'relative' mode) to allow
> the end-user to choose absolute paths.
>
>
> Thank you in advance for your time reviewing this patchset !
>
> Regards,
>
> Ludovic.
>
> PS: I very slightly changed my email address since last patch, this one is the good one.
> PPS: Should we provide an update of the NEWS file ?
>
>
>  bindings/python-cffi/notmuch2/__init__.py   |   1 +
>  bindings/python-cffi/notmuch2/_build.py     |  18 ++
>  bindings/python-cffi/notmuch2/_database.py  |  39 ++++-
>  bindings/python-cffi/notmuch2/_directory.py | 174 ++++++++++++++++++++
>  4 files changed, 230 insertions(+), 2 deletions(-)
>  create mode 100644 bindings/python-cffi/notmuch2/_directory.py
>
> diff --git a/bindings/python-cffi/notmuch2/__init__.py b/bindings/python-cffi/notmuch2/__init__.py
> index f281edc1..32067ddc 100644
> --- a/bindings/python-cffi/notmuch2/__init__.py
> +++ b/bindings/python-cffi/notmuch2/__init__.py
> @@ -45,6 +45,7 @@ usually expect from Python containers.
>  from notmuch2 import _capi
>  from notmuch2._base import *
>  from notmuch2._database import *
> +from notmuch2._directory import *
>  from notmuch2._errors import *
>  from notmuch2._message import *
>  from notmuch2._tags import *
> diff --git a/bindings/python-cffi/notmuch2/_build.py b/bindings/python-cffi/notmuch2/_build.py
> index f712b6c5..0f0a0a46 100644
> --- a/bindings/python-cffi/notmuch2/_build.py
> +++ b/bindings/python-cffi/notmuch2/_build.py
> @@ -134,6 +134,10 @@ ffibuilder.cdef(
>      notmuch_database_get_revision (notmuch_database_t *notmuch,
>                                     const char **uuid);
>      notmuch_status_t
> +    notmuch_database_get_directory (notmuch_database_t *database,
> +                                    const char *path,
> +                                    notmuch_directory_t **directory);
> +    notmuch_status_t
>      notmuch_database_index_file (notmuch_database_t *database,
>                                   const char *filename,
>                                   notmuch_indexopts_t *indexopts,
> @@ -303,6 +307,20 @@ ffibuilder.cdef(
>      void
>      notmuch_tags_destroy (notmuch_tags_t *tags);
>  
> +    notmuch_status_t
> +    notmuch_directory_set_mtime (notmuch_directory_t *directory,
> +                                 time_t mtime);
> +    time_t
> +    notmuch_directory_get_mtime (notmuch_directory_t *directory);
> +    notmuch_filenames_t *
> +    notmuch_directory_get_child_files (notmuch_directory_t *directory);
> +    notmuch_filenames_t *
> +    notmuch_directory_get_child_directories (notmuch_directory_t *directory);
> +    notmuch_status_t
> +    notmuch_directory_delete (notmuch_directory_t *directory);
> +    void
> +    notmuch_directory_destroy (notmuch_directory_t *directory);
> +
>      notmuch_bool_t
>      notmuch_filenames_valid (notmuch_filenames_t *filenames);
>      const char *
> diff --git a/bindings/python-cffi/notmuch2/_database.py b/bindings/python-cffi/notmuch2/_database.py
> index 868f4408..a42d5402 100644
> --- a/bindings/python-cffi/notmuch2/_database.py
> +++ b/bindings/python-cffi/notmuch2/_database.py
> @@ -9,6 +9,7 @@ import weakref
>  import notmuch2._base as base
>  import notmuch2._config as config
>  import notmuch2._capi as capi
> +import notmuch2._directory as directory
>  import notmuch2._errors as errors
>  import notmuch2._message as message
>  import notmuch2._query as querymod
> @@ -337,8 +338,42 @@ class Database(base.NotmuchObject):
>          rev = capi.lib.notmuch_database_get_revision(self._db_p, raw_uuid)
>          return DbRevision(rev, capi.ffi.string(raw_uuid[0]))
>  
> -    def get_directory(self, path):
> -        raise NotImplementedError
> +    def directory(self, path):
> +        """Returns a :class:`Directory` from the database given a pathname.
> +
> +        If a directory with the given pathname exists in the database
> +        return the :class:`Directory` instance for it.
> +        Otherwise raise a :exc:`LookupError` exception.
> +
> +        :param path: A path relative to the path of database (see :attr:`path`),
> +              or else an absolute path with initial components that match the
> +              path of database.
> +        :type path: str, bytes, os.PathLike or pathlib.Path

pathlib.Path is os.PathLike so no real need to mention it explicitly.
but it doesn't do any harm

> +        :returns: :class:`Directory` or raises an exception.
> +        :raises LookupError: The directory object referred to by ``path``
> +           does not exist in the database.
> +        :raises XapianError: A Xapian exception occurred.
> +        :raises UpgradeRequiredError: The database must be upgraded
> +           first.
> +        :raises OutOfMemoryError: When there is no memory to allocate
> +           the message instance.
> +        """
> +        if not hasattr(os, 'PathLike') and isinstance(path, pathlib.Path):
> +            path = bytes(path)

As mentioned before, I'd still remove this, `os.fsencode()` below
handles this automatically.

> +        directory_pp = capi.ffi.new('notmuch_directory_t **')
> +        ret = capi.lib.notmuch_database_get_directory(
> +          self._db_p, os.fsencode(path), directory_pp)
> +
> +        if ret != capi.lib.NOTMUCH_STATUS_SUCCESS:
> +            raise errors.NotmuchError(ret)
> +
> +        directory_p = directory_pp[0]
> +        if directory_p == capi.ffi.NULL:
> +            raise LookupError
> +
> +        path = pathlib.Path(path)
> +        ret_dir = directory.Directory(path.resolve(), directory_p, self)
> +        return ret_dir
>  
>      def default_indexopts(self):
>          """Returns default index options for the database.
> diff --git a/bindings/python-cffi/notmuch2/_directory.py b/bindings/python-cffi/notmuch2/_directory.py
> new file mode 100644
> index 00000000..fea11e22
> --- /dev/null
> +++ b/bindings/python-cffi/notmuch2/_directory.py
> @@ -0,0 +1,174 @@
> +import os
> +import pathlib
> +
> +import notmuch2._base as base
> +import notmuch2._capi as capi
> +import notmuch2._errors as errors
> +from ._message import FilenamesIter

In general PEP8 prefers to import modules rather than things inside it,
so I'd `import notmuch2._message as message`.  I also think this would
be the first relative import.  It probably doesn't matter much but for
consistency I'd avoid it.

> +
> +__all__ = ["Directory"]
> +
> +
> +class PathIter(FilenamesIter):
> +    """Iterator for pathlib.Path objects."""
> +
> +    def __init__(self, parent, iter_p, basepath=""):
> +        self._basepath = basepath
> +        super().__init__(parent, iter_p)
> +
> +    def __next__(self):
> +        fname = super().__next__()
> +        return pathlib.Path(self._basepath, os.fsdecode(fname))
> +
> +
> +class Directory(base.NotmuchObject):
> +    """Represents a directory entry in the notmuch database.
> +
> +    This should not be directly created, instead it will be returned
> +    by calling :meth:`Database.directory`.  A directory keeps a
> +    reference to the database object since the database object can not
> +    be released while the directory is in use.
> +
> +    Modifying attributes of this object will modify the
> +    database, not the real directory attributes.

Nitpicking, but I reformatted the below a bit to follow PEP
conventions a bit more.

> +    :param path: The absolute path of the directory object.
> +    :type path: str, bytes, os.PathLike or pathlib.Path
> +    :param dir_p: The pointer to an internal notmuch_directory_t object.
> +    :type dir_p: C-api notmuch_directory_t
> +    :param parent: The object this Directory is derived from (usually a
> +       :class:`Database`). We do not directly use this, but store a
> +       reference to it as long as this Directory object lives. This
> +       keeps the parent object alive.
> +    :type parent: Database
> +    """
> +
> +    _msg_p = base.MemoryPointer()
> +
> +    def __init__(self, path, dir_p, parent):
> +        """
> +        """

no point in an empty docstring :)

> +        self._path = pathlib.Path(path)
> +        self._dir_p = dir_p
> +        self._parent = parent
> +
> +    @property
> +    def alive(self):
> +        if not self._parent.alive:
> +            return False
> +        try:
> +            self._dir_p
> +        except errors.ObjectDestroyedError:
> +            return False
> +        else:
> +            return True
> +
> +    def __del__(self):
> +        """Close and free the Directory"""
> +        self._destroy()
> +
> +    def _destroy(self):
> +        if self.alive:
> +            capi.lib.notmuch_directory_destroy(self._dir_p)
> +        self._dir_p = None
> +
> +    def set_mtime(self, mtime):

@mtime.setter
def mtime(self, mtime):

Basically make this in a getter/setter property using the same style as
Database.decrypt_policy is written.  And probably should be moved to
below the getter.

> +        """Sets the mtime value of this directory in the database
> +
> +        The intention is for the caller to use the mtime to allow efficient
> +        identification of new messages to be added to the database. The
> +        recommended usage is as follows:
> +
> +        * Read the mtime of a directory from the filesystem
> +
> +        * Call :meth:`Database.index_file` for all mail files in
> +          the directory
> +
> +        * Call notmuch_directory_set_mtime with the mtime read from the
> +          filesystem.  Then, when wanting to check for updates to the
> +          directory in the future, the client can call :meth:`get_mtime`
> +          and know that it only needs to add files if the mtime of the
> +          directory and files are newer than the stored timestamp.
> +
> +          .. note::
> +
> +                :meth:`get_mtime` function does not allow the caller to
> +                distinguish a timestamp of 0 from a non-existent timestamp. So
> +                don't store a timestamp of 0 unless you are comfortable with
> +                that.

I think Id'd like to go further: do not allow setting 0, raising a
ValueError instead.  To allow removing the mtime we would require an
explicit `None` value.

> +        :param mtime: A POSIX timestamp
> +        :type mtime: int or float
> +        :raises: :exc:`XapianError` a Xapian exception occurred, mtime
> +                 not stored
> +        :raises: :exc:`ReadOnlyDatabaseError` the database was opened
> +                 in read-only mode so directory mtime cannot be modified
> +        """
> +        ret = capi.lib.notmuch_directory_set_mtime(self._dir_p, int(mtime))
> +
> +        if ret != capi.lib.NOTMUCH_STATUS_SUCCESS:
> +            raise errors.NotmuchError(ret)
> +
> +    def get_mtime(self):

@property
def mtime(self):

> +        """Gets the mtime value of this directory in the database
> +
> +        Retrieves a previously stored mtime for this directory.
> +
> +        :returns: A POSIX timestamp
> +        :rtype: int
> +        """
> +        return capi.lib.notmuch_directory_get_mtime(self._dir_p)

I reckon we should check the value and `raise AttributeError` if it is
`0`.  This hides the low-level C-ism.  And even if you are interacting
with another program that does use this 0 for something else you still
don't lose any information.

> +
> +    # Make mtime attribute a property of Directory()
> +    mtime = property(
> +        get_mtime,
> +        set_mtime,
> +        doc="""Property that allows getting
> +                     and setting of the Directory *mtime* (read-write)
> +
> +                     See :meth:`get_mtime` and :meth:`set_mtime` for usage and
> +                     possible exceptions.""",
> +    )

As said, let's use the existing @property and @prop.setter syntax and
remove this.

> +    def files(self, absolute=False):

I don't think I'm a fan of the absolute parameter, it seems weird.  Why
not always make it absolute?  (or keep it relative and go back to
PurePath ;))

> +        """Gets a :class:`PathIter` iterator listing all the filenames of
> +        messages in the database within the given directory.

nitpicking, and being more explicit for people like me who get easily
confused by this API:

"""Returns iterator of all files in the directory.

The iterator yields all the files as known in the database, this is
not a filesystem query.

> +
> +        :param absolute: `True` to return complete paths, and `False`
> +                         to return basename-entries only (not complete paths),
> +                         defaults to `False`

For some reason python docstrings use 3-space indents for continuations
instead of aligning.  I which we had auto-formatters for this.

> +        :type absolute: boolean, optional
> +
> +        :returns: An iterator yielding :class:`pathlib.Path` instances.
> +        :rtype: PathIter
> +        """
> +        fnames_p = capi.lib.notmuch_directory_get_child_files(self._dir_p)
> +        return PathIter(self, fnames_p, self.path if absolute else "")
> +
> +    def directories(self, absolute=False):

same wrt absolute parameter

> +        """Gets a :class:`PathIter` iterator listing all the filenames of
> +        sub-directories in the database within the given directory.
> +
> +        :param absolute: `True` to return complete paths, and `False`
> +                         to return basename-entries only (not complete paths),
> +                         defaults to `False`
> +        :type absolute: boolean, optional
> +
> +        :returns: An iterator yielding :class:`pathlib.Path` instances.
> +        :rtype: PathIter
> +        """
> +        fnames_p = capi.lib.notmuch_directory_get_child_directories(self._dir_p)
> +        return PathIter(self, fnames_p, self.path if absolute else "")
> +
> +    @property
> +    def path(self):
> +        """the absolute path of this Directory as a :class:`pathlib.Path` instance.
> +
> +        :rtype: pathlib.Path
> +        """
> +        return self._path
> +
> +    def __repr__(self):
> +        """Object representation"""
> +        return '<Directory(path={self.path})>'.format(self=self)


Cheers,
Floris
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Thread: