Re: [PATCH 1/1] python/notmuch2: provide binding for database_get_directory()

Subject: Re: [PATCH 1/1] python/notmuch2: provide binding for database_get_directory()

Date: Sun, 25 Jul 2021 22:18:58 +0200

To: Ludovic LANGE, notmuch@notmuchmail.org

Cc: Ludovic LANGE

From: Floris Bruynooghe


Hi Ludovic,

Thanks for having a look at this!

On Sun 25 Jul 2021 at 10:16 +0200, Ludovic LANGE wrote:

> diff --git a/bindings/python-cffi/notmuch2/_database.py b/bindings/python-cffi/notmuch2/_database.py
> index 868f4408..e48fa895 100644
> --- a/bindings/python-cffi/notmuch2/_database.py
> +++ b/bindings/python-cffi/notmuch2/_database.py
> @@ -13,6 +13,7 @@ import notmuch2._errors as errors
>  import notmuch2._message as message
>  import notmuch2._query as querymod
>  import notmuch2._tags as tags
> +import notmuch2._directory as directory
>  
>  
>  __all__ = ['Database', 'AtomicContext', 'DbRevision']
> @@ -338,7 +339,46 @@ class Database(base.NotmuchObject):
>          return DbRevision(rev, capi.ffi.string(raw_uuid[0]))
>  
>      def get_directory(self, path):
> -        raise NotImplementedError
> +        """Returns a :class:`Directory` from the database for path,
> +
> +        :param path: An unicode string containing the path relative to the path
> +              of database (see :attr:`path`), or else should be an absolute
> +              path with initial components that match the path of 'database'.

Instead of a unicode string instead it should accept anything that
`os.fspath` accepts: str, bytes or os.PathLike.  This allows using
e.g. pathlib.Path as well.

> +        :returns: :class:`Directory` or raises an exception.
> +        :raises: :exc:`FileError` if path is not relative database or absolute
> +                 with initial components same as database.
> +
> +
> +        :raises XapianError: A Xapian exception occurred.
> +        :raises LookupError: The directory object referred to by ``pathname``

path or pathname?  choose one ;)

> +           does not exist in the database.
> +        :raises FileNotEmailError: The file referreed to by
> +           ``pathname`` is not recognised as an email message.

same

> +        :raises UpgradeRequiredError: The database must be upgraded
> +           first.
> +        """
> +        if not hasattr(os, 'PathLike') and isinstance(path, pathlib.Path):
> +            path = bytes(path)

I think `path = os.fspath(path)` should handle this fine.  But I don't
think you even need to do that as `os.fsencode()` will just do the right
thing already.

> +        directory_pp = capi.ffi.new('notmuch_directory_t **')
> +        ret = capi.lib.notmuch_database_get_directory(
> +          self._db_p, os.fsencode(path), directory_pp)
> +
> +        if ret != capi.lib.NOTMUCH_STATUS_SUCCESS:
> +            raise errors.NotmuchError(ret)
> +
> +        directory_p = directory_pp[0]
> +        if directory_p == capi.ffi.NULL:
> +            raise LookupError
> +
> +        if os.path.isabs(path):
> +            # we got an absolute path
> +            abs_dirpath = path
> +        else:
> +            #we got a relative path, make it absolute
> +            abs_dirpath = os.path.abspath(os.path.join(self.get_path(), path))
> +
> +        ret_dir = directory.Directory(abs_dirpath, directory_p, self)

I think the code has been trying to use pathlib for path manipulations,
so for consistency it'd be good to keep doing that  This if condition
becomes much simpler:

path = pathlib.Path(path)  # this is a no-op if it already is an instance
ret_dir = directory.Directory(path.absolute(), directory_p, self)

> +        return ret_dir
>  
>      def default_indexopts(self):
>          """Returns default index options for the database.
> diff --git a/bindings/python-cffi/notmuch2/_directory.py b/bindings/python-cffi/notmuch2/_directory.py
> new file mode 100644
> index 00000000..1d48aa54
> --- /dev/null
> +++ b/bindings/python-cffi/notmuch2/_directory.py
> @@ -0,0 +1,164 @@
> +import os
> +import pathlib
> +
> +import notmuch2._base as base
> +import notmuch2._capi as capi
> +import notmuch2._errors as errors
> +from ._message import FilenamesIter
> +
> +__all__ = ["Directory"]
> +
> +
> +class PurePathIter(FilenamesIter):
> +    """Iterator for pathlib.PurePath objects."""
> +
> +    def __next__(self):
> +        fname = super().__next__()
> +        return pathlib.PurePath(os.fsdecode(fname))

I'm surprised you explicitly need a PurePath here?


> +class Directory(base.NotmuchObject):
> +    """Represents a directory entry in the notmuch directory
> +
> +    Modifying attributes of this object will modify the
> +    database, not the real directory attributes.
> +
> +    The Directory object is usually derived from another object
> +    e.g. via :meth:`Database.get_directory`, and will automatically be
> +    become invalid whenever that parent is deleted. You should
> +    therefore initialized this object handing it a reference to the
> +    parent, preventing the parent from automatically being garbage
> +    collected.
> +    """
> +
> +    _msg_p = base.MemoryPointer()
> +
> +    def __init__(self, path, dir_p, parent):
> +        """
> +        :param path:   The absolute path of the directory object.

For consistency I'd require this to be `pathlib.Path`.

> +        :param dir_p:  The pointer to an internal notmuch_directory_t object.
> +        :param parent: The object this Directory is derived from
> +                       (usually a :class:`Database`). We do not directly use
> +                       this, but store a reference to it as long as
> +                       this Directory object lives. This keeps the
> +                       parent object alive.
> +        """

Just add this docstring the the class docstring, that's the usual way to
document __init__.

> +        self._path = path
> +        self._dir_p = dir_p
> +        self._parent = parent
> +
> +    @property
> +    def alive(self):
> +        if not self._parent.alive:
> +            return False
> +        try:
> +            self._dir_p
> +        except errors.ObjectDestroyedError:
> +            return False
> +        else:
> +            return True
> +
> +    def __del__(self):
> +        """Close and free the Directory"""
> +        self._destroy()
> +
> +    def _destroy(self):
> +        if self.alive:
> +            capi.lib.notmuch_directory_destroy(self._dir_p)
> +        self._dir_p = None
> +
> +    def set_mtime(self, mtime):
> +        """Sets the mtime value of this directory in the database
> +
> +        The intention is for the caller to use the mtime to allow efficient
> +        identification of new messages to be added to the database. The
> +        recommended usage is as follows:
> +
> +        * Read the mtime of a directory from the filesystem
> +
> +        * Call :meth:`Database.index_file` for all mail files in
> +          the directory
> +
> +        * Call notmuch_directory_set_mtime with the mtime read from the
> +          filesystem.  Then, when wanting to check for updates to the
> +          directory in the future, the client can call :meth:`get_mtime`
> +          and know that it only needs to add files if the mtime of the
> +          directory and files are newer than the stored timestamp.
> +
> +          .. note::
> +
> +                :meth:`get_mtime` function does not allow the caller to
> +                distinguish a timestamp of 0 from a non-existent timestamp. So
> +                don't store a timestamp of 0 unless you are comfortable with
> +                that.

Love the docs on how this is actually intended to be used!

> +
> +        :param mtime: A (time_t) timestamp

This description works for C-developers, but searching docs.python.org
for that will get people very confused.  I'd suggest:

:param mtime: A POSIX timestamp.
:type mtime: int or float

I'm suggesting to accept float here because `time.time()` returns a
float.  `os.mtime(...).st_mtime` is int though, so you could argue to
restrict it to int.

> +        :raises: :exc:`XapianError` a Xapian exception occurred, mtime
> +                 not stored
> +        :raises: :exc:`ReadOnlyDatabaseError` the database was opened
> +                 in read-only mode so directory mtime cannot be modified
> +        :raises: :exc:`NotInitializedError` the directory object has not
> +                 been initialized
> +        """
> +        ret = capi.lib.notmuch_directory_set_mtime(self._dir_p, mtime)

int(mtime) if you follow my suggestion above

> +
> +        if ret != capi.lib.NOTMUCH_STATUS_SUCCESS:
> +            raise errors.NotmuchError(ret)
> +
> +    def get_mtime(self):
> +        """Gets the mtime value of this directory in the database
> +
> +        Retrieves a previously stored mtime for this directory.
> +
> +        :param mtime: A (time_t) timestamp

I don't think there is a param ;)

> +        :raises: :exc:`NotmuchError`:
> +
> +                        :attr:`STATUS`.NOT_INITIALIZED
> +                          The directory has not been initialized

An old comment?

> +        """
> +        return capi.lib.notmuch_directory_get_mtime(self._dir_p)
> +
> +    # Make mtime attribute a property of Directory()
> +    mtime = property(
> +        get_mtime,
> +        set_mtime,
> +        doc="""Property that allows getting
> +                     and setting of the Directory *mtime* (read-write)
> +
> +                     See :meth:`get_mtime` and :meth:`set_mtime` for usage and
> +                     possible exceptions.""",
> +    )

Hmm, either we should have a get/set API or a property, not both in my
opinion.  I'm a bit conflicted on which is better, the code only has one
setter property currently, Database.decrypt_policy and that feels
somewhat similar so it could be good for consistency.  But maybe that
choice was a mistake.  In general assigning to a property does not
have the expectation of taking an action I think, and here you are
directly updating the database.

Anyway, I'm curious what other people think of this as well.

> +    def get_child_files(self):

Style-wise I'd name this `def files(self):`, I think it follows the
style of the API more.


> +        """Gets a Filenames iterator listing all the filenames of
> +        messages in the database within the given directory.
> +
> +        The returned filenames will be the basename-entries only (not
> +        complete paths.
> +        """
> +        fnames_p = capi.lib.notmuch_directory_get_child_files(self._dir_p)
> +        return PurePathIter(self, fnames_p)

Hmm, seeing it's use I think this should probably be an iterator
returning `pathlib.Path`, only giving a PurePath seems almost mean.  I'd
make sure they are already usable, instead of still having to get the
path property to make them useful, so pass in the path of the directory
to the iterator so it can return absolute paths.

> +
> +    def get_child_directories(self):

likewise `directories()` probably

> +        """Gets a :class:`Filenames` iterator listing all the filenames of
> +        sub-directories in the database within the given directory
> +
> +        The returned filenames will be the basename-entries only (not
> +        complete paths.
> +        """
> +        fnames_p = capi.lib.notmuch_directory_get_child_directories(self._dir_p)
> +        return PurePathIter(self, fnames_p)

same as for the files i guess

> +    @property
> +    def path(self):
> +        """Returns the absolute path of this Directory (read-only)"""

Some nitpicking, I'd document the return type (which should be changed
to pathlib.Path), also the `read-only` part is already implicit in the
signature, and it's a property so doesn't "return" things:

"""The path of this Directory as a pathlib.Path instance."""

> +        return self._path
> +
> +    def __repr__(self):
> +        """Object representation"""
> +        try:
> +            self._dir_p
> +        except errors.ObjectDestroyedError:
> +            return '<Directory(path={self.path}) (exhausted)>'.format(self=self)

The `(exhaused)` part seems odd, I'm even surprised you need this except
clause for this object.

> +        else:
> +            return '<Directory(path={self.path})>'.format(self=self)


Hope the many comments are acceptable, once again thanks for adding this
and feel free to disagree with some suggestions :)

Cheers,
Floris
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Thread: