Re: [PATCH v2 1/7] Make keys of notmuch-tag-formats regexps and use caching

Subject: Re: [PATCH v2 1/7] Make keys of notmuch-tag-formats regexps and use caching

Date: Mon, 10 Mar 2014 22:06:59 -0400

To: Mark Walters, notmuch@notmuchmail.org

Cc:

From: Austin Clements


On Wed, 19 Feb 2014, Mark Walters <markwalters1009@gmail.com> wrote:
> From: Austin Clements <amdragon@MIT.EDU>
>
> This patch switches notmuch-tag-formats to use regexps with caching
> for performance.
>
> We have to clear the cache somehow on changes to notmuch-tag-formats.
> This version takes the simplest approach: search/show/tree all clear
> the cache whenever they start loading.
>
> We cannot use assoc-default since there's no way to distinguish a
> missing key from a present key with a null cdr: thus, we use assoc*
> from cl instead.
>
> Performance-wise, the caching of regexp lookup makes this at least as
> fast as the previous code using assoc (see
> id:1392226351-31440-1-git-send-email-amdragon@mit.edu for timing
> details).

How about this for a commit message?

- 8< -

This modifies `notmuch-tag-format-tag' to treat the keys of
`notmuch-tag-formats' as (anchored) regexps, rather than literal
strings.  This is clearly more flexible, as it allows for prefix
matching, defining a fallback format, etc.  This may cause compatibility
problems if people have customized `notmuch-tag-formats' to match tags
that contain regexp specials, but this seems unlikely.

Regular expression matching has quite a performance hit over string
lookup, so this also introduces a simple cache from exact tags to
formatted strings.  The number of unique tags is likely to be quite
small, so this cache should have a high hit rate.  In addition to
eliminating the regexp lookup in the common case, this cache stores
fully formatted tags, eliminating the repeated evaluation of potentially
expensive, user-specified formatting code.  This makes regexp lookup at
least as fast as assoc for unformatted tags (e.g., inbox) and *faster*
than the current code for formatted tags (e.g., unread):

                    inbox (usec)   unread (usec)
    assoc:              0.4            2.8
    regexp:             3.2            7.2
    regexp+caching:     0.4            0.4

(Though even at 7.2 usec, tag formatting is not our top bottleneck.)

This cache must be explicitly cleared to keep it coherent, so this adds
the appropriate clearing calls.

- >8 -

> ---
>  emacs/notmuch-show.el |    1 +
>  emacs/notmuch-tag.el  |   70 +++++++++++++++++++++++++++++++++---------------
>  emacs/notmuch-tree.el |    1 +
>  emacs/notmuch.el      |    1 +
>  4 files changed, 51 insertions(+), 22 deletions(-)
>
> diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
> index 1ac80ca..4bddf6c 100644
> --- a/emacs/notmuch-show.el
> +++ b/emacs/notmuch-show.el
> @@ -1145,6 +1145,7 @@ function is used."
>      ;; Don't track undo information for this buffer
>      (set 'buffer-undo-list t)
>  
> +    (notmuch-tag-clear-cache)
>      (erase-buffer)
>      (goto-char (point-min))
>      (save-excursion
> diff --git a/emacs/notmuch-tag.el b/emacs/notmuch-tag.el
> index 908e7ad..47e0205 100644
> --- a/emacs/notmuch-tag.el
> +++ b/emacs/notmuch-tag.el
> @@ -28,23 +28,39 @@
>  (require 'crm)
>  (require 'notmuch-lib)
>  
> +;; (notmuch-tag-clear-cache will be called by the defcustom
> +;; notmuch-tag-formats, so it has to be defined first.)

This is no longer true, so this comment can be removed and the following
two defs can be moved to a more natural location further down in the
file.

> +
> +(defvar notmuch-tag--format-cache (make-hash-table :test 'equal)
> +  "Cache of tag format lookup.  Internal to `notmuch-tag-format-tag'.")
> +
> +(defun notmuch-tag-clear-cache ()
> +  "Clear the internal cache of tag formats.
> +
> +This must be called after changes to `notmuch-tag-formats'."

Likewise, this sentence is out of date.  I would just omit it and add
the doc I suggest below.

> +  (clrhash notmuch-tag--format-cache))
> +
>  (defcustom notmuch-tag-formats
>    '(("unread" (propertize tag 'face '(:foreground "red")))
>      ("flagged" (propertize tag 'face '(:foreground "blue"))
>       (notmuch-tag-format-image-data tag (notmuch-tag-star-icon))))
>    "Custom formats for individual tags.
>  
> -This gives a list that maps from tag names to lists of formatting
> -expressions.  The car of each element gives a tag name and the
> -cdr gives a list of Elisp expressions that modify the tag.  If
> -the list is empty, the tag will simply be hidden.  Otherwise,
> -each expression will be evaluated in order: for the first
> -expression, the variable `tag' will be bound to the tag name; for
> -each later expression, the variable `tag' will be bound to the
> -result of the previous expression.  In this way, each expression
> -can build on the formatting performed by the previous expression.
> -The result of the last expression will displayed in place of the
> -tag.
> +This is an association list that maps from tag name regexps to
> +lists of formatting expressions.  The first entry whose car
> +regexp-matches a tag will be used to format that tag.  The regexp
> +is implicitly anchored, so to match a literal tag name, just use
> +that tag name (if it contains special regexp characters like
> +\".\" or \"*\", these have to be escaped).  The cdr of the
> +matching entry gives a list of Elisp expressions that modify the
> +tag.  If the list is empty, the tag will simply be hidden.
> +Otherwise, each expression will be evaluated in order: for the
> +first expression, the variable `tag' will be bound to the tag
> +name; for each later expression, the variable `tag' will be bound
> +to the result of the previous expression.  In this way, each
> +expression can build on the formatting performed by the previous
> +expression.  The result of the last expression will displayed in
> +place of the tag.
>  
>  For example, to replace a tag with another string, simply use
>  that string as a formatting expression.  To change the foreground
> @@ -56,7 +72,7 @@ with images."
>  
>    :group 'notmuch-search
>    :group 'notmuch-show
> -  :type '(alist :key-type (string :tag "Tag")
> +  :type '(alist :key-type (regexp :tag "Tag")
>  		:extra-offset -3
>  		:value-type
>  		(radio :format "%v"
> @@ -137,16 +153,26 @@ This can be used with `notmuch-tag-format-image-data'."
>  
>  (defun notmuch-tag-format-tag (tag)
>    "Format TAG by looking into `notmuch-tag-formats'."

This would be a great place to mention that modes need to call
`notmuch-tag-clear-cache' if they intent to use formatted tags.

  "Format TAG according to `notmuch-tag-formats'.

Callers must ensure that the tag format cache has been recently cleared
via `notmuch-tag-clear-cache' before using this function.  For example,
it would be appropriate to clear the cache just prior to filling a
buffer that uses formatted tags."

> -  (let ((formats (assoc tag notmuch-tag-formats)))
> -    (cond
> -     ((null formats)		;; - Tag not in `notmuch-tag-formats',
> -      tag)			;;   the format is the tag itself.
> -     ((null (cdr formats))	;; - Tag was deliberately hidden,
> -      nil)			;;   no format must be returned
> -     (t				;; - Tag was found and has formats,
> -      (let ((tag tag))		;;   we must apply all the formats.
> -	(dolist (format (cdr formats) tag)
> -	  (setq tag (eval format))))))))
> +  (let ((formatted (gethash tag notmuch-tag--format-cache 'missing)))
> +    (when (eq formatted 'missing)
> +      (let* ((formats
> +	      (save-match-data

This is a better place for the comment about assoc* that's currently in
the commit message.

;; Don't use assoc-default since there's no way to distinguish a missing
;; key from a present key with a null cdr:.

> +		(assoc* tag notmuch-tag-formats
> +			:test (lambda (tag key)
> +				(and (eq (string-match key tag) 0)
> +				     (= (match-end 0) (length tag))))))))
> +	(setq formatted
> +	      (cond
> +	       ((null formats)		;; - Tag not in `notmuch-tag-formats',
> +		tag)			;;   the format is the tag itself.
> +	       ((null (cdr formats))	;; - Tag was deliberately hidden,
> +		nil)			;;   no format must be returned
> +	       (t			;; - Tag was found and has formats,
> +		(let ((tag tag))	;;   we must apply all the formats.
> +		  (dolist (format (cdr formats) tag)
> +		    (setq tag (eval format)))))))
> +	(puthash tag formatted notmuch-tag--format-cache)))
> +    formatted))
>  
>  (defun notmuch-tag-format-tags (tags &optional face)
>    "Return a string representing formatted TAGS."
> diff --git a/emacs/notmuch-tree.el b/emacs/notmuch-tree.el
> index 4f2ac02..a106e09 100644
> --- a/emacs/notmuch-tree.el
> +++ b/emacs/notmuch-tree.el
> @@ -881,6 +881,7 @@ the same as for the function notmuch-tree."
>  	 (message-arg "--entire-thread"))
>      (if (equal (car (process-lines notmuch-command "count" search-args)) "0")
>  	(setq search-args basic-query))
> +    (notmuch-tag-clear-cache)
>      (let ((proc (notmuch-start-notmuch
>  		 "notmuch-tree" (current-buffer) #'notmuch-tree-process-sentinel
>  		 "show" "--body=false" "--format=sexp"
> diff --git a/emacs/notmuch.el b/emacs/notmuch.el
> index 0471750..0c767f7 100644
> --- a/emacs/notmuch.el
> +++ b/emacs/notmuch.el
> @@ -888,6 +888,7 @@ the configured default sort order."
>      (set 'notmuch-search-oldest-first oldest-first)
>      (set 'notmuch-search-target-thread target-thread)
>      (set 'notmuch-search-target-line target-line)
> +    (notmuch-tag-clear-cache)
>      (let ((proc (get-buffer-process (current-buffer)))
>  	  (inhibit-read-only t))
>        (if proc
> -- 
> 1.7.9.1

Thread: