Re: [PATCH 1/1] python/notmuch2: provide binding for database_get_directory()

Subject: Re: [PATCH 1/1] python/notmuch2: provide binding for database_get_directory()

Date: Sun, 25 Jul 2021 22:46:47 +0200

To: notmuch@notmuchmail.org

Cc:

From: Ludovic LANGE


Hi Floris,

Thanks for taking the time to make a review, this is very appreciated !

I'll answer below, but please note : for the comments, I was being lazy,
and they come directly from the previous python bindings. I was
expecting that the whole API was exactly the same and only differing in
implementation, so I was hoping that I could copy/paste them. It seems I
was wrong.

I'll send an updated version soon.


Le 25/07/2021 à 22:18, Floris Bruynooghe a écrit :
> @@ -338,7 +339,46 @@ class Database(base.NotmuchObject):
>>          return DbRevision(rev, capi.ffi.string(raw_uuid[0]))
>>  
>>      def get_directory(self, path):
>> -        raise NotImplementedError
>> +        """Returns a :class:`Directory` from the database for path,
>> +
>> +        :param path: An unicode string containing the path relative to the path
>> +              of database (see :attr:`path`), or else should be an absolute
>> +              path with initial components that match the path of 'database'.
> Instead of a unicode string instead it should accept anything that
> `os.fspath` accepts: str, bytes or os.PathLike.  This allows using
> e.g. pathlib.Path as well.
agreed.
>> +        :returns: :class:`Directory` or raises an exception.
>> +        :raises: :exc:`FileError` if path is not relative database or absolute
>> +                 with initial components same as database.
>> +
>> +
>> +        :raises XapianError: A Xapian exception occurred.
>> +        :raises LookupError: The directory object referred to by ``pathname``
> path or pathname?  choose one ;)
agreed.
>> +           does not exist in the database.
>> +        :raises FileNotEmailError: The file referreed to by
>> +           ``pathname`` is not recognised as an email message.
> same
agreed.
>> +        :raises UpgradeRequiredError: The database must be upgraded
>> +           first.
>> +        """
>> +        if not hasattr(os, 'PathLike') and isinstance(path, pathlib.Path):
>> +            path = bytes(path)
> I think `path = os.fspath(path)` should handle this fine.  But I don't
> think you even need to do that as `os.fsencode()` will just do the right
> thing already.

This is a construct I saw multiple times in the same file (_database.py)
of the notmuch2 python bindings. Not familiar with it, I guessed I
should do the same also.

I'm ok to remove it, but for consistency a more general approach should
be taken for the rest of the ~6 times it occurs in this file. I'd be
happy to have more insights on why you (as you seem to be the author of
the new bindings, and of these constructs) had to put them here the
first time ?

>> +        directory_pp = capi.ffi.new('notmuch_directory_t **')
>> +        ret = capi.lib.notmuch_database_get_directory(
>> +          self._db_p, os.fsencode(path), directory_pp)
>> +
>> +        if ret != capi.lib.NOTMUCH_STATUS_SUCCESS:
>> +            raise errors.NotmuchError(ret)
>> +
>> +        directory_p = directory_pp[0]
>> +        if directory_p == capi.ffi.NULL:
>> +            raise LookupError
>> +
>> +        if os.path.isabs(path):
>> +            # we got an absolute path
>> +            abs_dirpath = path
>> +        else:
>> +            #we got a relative path, make it absolute
>> +            abs_dirpath = os.path.abspath(os.path.join(self.get_path(), path))
>> +
>> +        ret_dir = directory.Directory(abs_dirpath, directory_p, self)
> I think the code has been trying to use pathlib for path manipulations,
> so for consistency it'd be good to keep doing that  This if condition
> becomes much simpler:
>
> path = pathlib.Path(path)  # this is a no-op if it already is an instance
> ret_dir = directory.Directory(path.absolute(), directory_p, self)

I'll try this, thanks.


>> +class PurePathIter(FilenamesIter):
>> +    """Iterator for pathlib.PurePath objects."""
>> +
>> +    def __next__(self):
>> +        fname = super().__next__()
>> +        return pathlib.PurePath(os.fsdecode(fname))
> I'm surprised you explicitly need a PurePath here?

Not needed. My reasoning was : a Directory is returning real-paths
(directories, files) on the filesystem, but the API is only directed
towards manipulating the notmuch database, not the underlying filesystem
(otherwise we would have to re-run notmuch to synch the differences). So
by returning a PurePath (as opposed to a concrete Path) it could be a
signal to the user not to mess with the undeerlying filesystem.


>> +    def __init__(self, path, dir_p, parent):
>> +        """
>> +        :param path:   The absolute path of the directory object.
> For consistency I'd require this to be `pathlib.Path`.

Ok, the idea here was: this is an object returned by the API, and not
expected to be instantiated by the end-user.

I'll do that nevertheless.

>> +        :param dir_p:  The pointer to an internal notmuch_directory_t object.
>> +        :param parent: The object this Directory is derived from
>> +                       (usually a :class:`Database`). We do not directly use
>> +                       this, but store a reference to it as long as
>> +                       this Directory object lives. This keeps the
>> +                       parent object alive.
>> +        """
> Just add this docstring the the class docstring, that's the usual way to
> document __init__.
Why not - here also it's a copy/paste.
>> +
>> +        :param mtime: A (time_t) timestamp
> This description works for C-developers, but searching docs.python.org
> for that will get people very confused.  I'd suggest:
>
> :param mtime: A POSIX timestamp.
> :type mtime: int or float
>
> I'm suggesting to accept float here because `time.time()` returns a
> float.  `os.mtime(...).st_mtime` is int though, so you could argue to
> restrict it to int.
Let me check that.
>
>> +        :raises: :exc:`XapianError` a Xapian exception occurred, mtime
>> +                 not stored
>> +        :raises: :exc:`ReadOnlyDatabaseError` the database was opened
>> +                 in read-only mode so directory mtime cannot be modified
>> +        :raises: :exc:`NotInitializedError` the directory object has not
>> +                 been initialized
>> +        """
>> +        ret = capi.lib.notmuch_directory_set_mtime(self._dir_p, mtime)
> int(mtime) if you follow my suggestion above
Ok
>
>> +
>> +        if ret != capi.lib.NOTMUCH_STATUS_SUCCESS:
>> +            raise errors.NotmuchError(ret)
>> +
>> +    def get_mtime(self):
>> +        """Gets the mtime value of this directory in the database
>> +
>> +        Retrieves a previously stored mtime for this directory.
>> +
>> +        :param mtime: A (time_t) timestamp
> I don't think there is a param ;)
You're right. Copy/paste is such a bad thing.
>> +        :raises: :exc:`NotmuchError`:
>> +
>> +                        :attr:`STATUS`.NOT_INITIALIZED
>> +                          The directory has not been initialized
> An old comment?

I'll have a look at it.


>> +        """
>> +        return capi.lib.notmuch_directory_get_mtime(self._dir_p)
>> +
>> +    # Make mtime attribute a property of Directory()
>> +    mtime = property(
>> +        get_mtime,
>> +        set_mtime,
>> +        doc="""Property that allows getting
>> +                     and setting of the Directory *mtime* (read-write)
>> +
>> +                     See :meth:`get_mtime` and :meth:`set_mtime` for usage and
>> +                     possible exceptions.""",
>> +    )
> Hmm, either we should have a get/set API or a property, not both in my
> opinion.  I'm a bit conflicted on which is better, the code only has one
> setter property currently, Database.decrypt_policy and that feels
> somewhat similar so it could be good for consistency.  But maybe that
> choice was a mistake.  In general assigning to a property does not
> have the expectation of taking an action I think, and here you are
> directly updating the database.
>
> Anyway, I'm curious what other people think of this as well.

Copy/paste (!!) of the previous bindings. As I don't know how people are
using them, I was expecting to make as little changes as necessary, so
that code change would be kept to a minimal when porting from old
bindings to new bindings.

Waiting for other feedback also.


>> +    def get_child_files(self):
> Style-wise I'd name this `def files(self):`, I think it follows the
> style of the API more.

Same reasoning as before - not wanting to make too much changes from old
to new bindings.

But if the consensus is on having a new style, we will of course adapt.


>> +        """Gets a Filenames iterator listing all the filenames of
>> +        messages in the database within the given directory.
>> +
>> +        The returned filenames will be the basename-entries only (not
>> +        complete paths.
>> +        """
>> +        fnames_p = capi.lib.notmuch_directory_get_child_files(self._dir_p)
>> +        return PurePathIter(self, fnames_p)
> Hmm, seeing it's use I think this should probably be an iterator
> returning `pathlib.Path`, only giving a PurePath seems almost mean.  I'd
> make sure they are already usable, instead of still having to get the
> path property to make them useful, so pass in the path of the directory
> to the iterator so it can return absolute paths.

I'm mean :-)

I explained previously my approach, but don't mind giving a "workable"
path. Will do.


>
>> +
>> +    def get_child_directories(self):
> likewise `directories()` probably

Same idea as get_child_files().  I'm curious what other people think of
this as well.


>> +        """Gets a :class:`Filenames` iterator listing all the filenames of
>> +        sub-directories in the database within the given directory
>> +
>> +        The returned filenames will be the basename-entries only (not
>> +        complete paths.
>> +        """
>> +        fnames_p = capi.lib.notmuch_directory_get_child_directories(self._dir_p)
>> +        return PurePathIter(self, fnames_p)
> same as for the files i guess

OK

>> +    @property
>> +    def path(self):
>> +        """Returns the absolute path of this Directory (read-only)"""
> Some nitpicking, I'd document the return type (which should be changed
> to pathlib.Path), also the `read-only` part is already implicit in the
> signature, and it's a property so doesn't "return" things:
>
> """The path of this Directory as a pathlib.Path instance."""
Ok
>> +        return self._path
>> +
>> +    def __repr__(self):
>> +        """Object representation"""
>> +        try:
>> +            self._dir_p
>> +        except errors.ObjectDestroyedError:
>> +            return '<Directory(path={self.path}) (exhausted)>'.format(self=self)
> The `(exhaused)` part seems odd, I'm even surprised you need this except
> clause for this object.

Inspiration came from TagsIter / NotmuchIter - but I didn't try to
reproduced it on this object. I don't know which moves could make you
have dangling references or destroyed objects here.

If you prefer I'll simplify this repr.


> Hope the many comments are acceptable, once again thanks for adding this
> and feel free to disagree with some suggestions :)

Thanks again !

I'll submit a new patch soon.

Regards,

Ludovic

_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Thread: