The following Unicode's bidirectional control chars are modal so that they push a new bidirectional rendering mode to a stack: U+202A LEFT-TO-RIGHT EMBEDDING U+202B RIGHT-TO-LEFT EMBEDDING U+202D LEFT-TO-RIGHT OVERRIDE U+202E RIGHT-TO-LEFT OVERRIDE Every mode must be terminated with with character U+202C POP DIRECTIONAL FORMATTING which pops the mode from the stack. The stack is per paragraph. A new text paragraph resets the rendering mode changed by these control characters. This change adds a new function "notmuch-balance-bidi-ctrl-chars" which reads its STRING argument and ensures that all push characters (U+202A, U+202B, U+202D, U+202E) have a pop character pair (U+202C). The function may add more U+202C characters at the end of the returned string, or it may remove some U+202C characters. The returned string is safe in the sense that it won't change the surrounding bidirectional rendering mode. This function should be used when sanitizing arbitrary input. --- emacs/notmuch-lib.el | 54 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el index 118faf1e..e6252c6c 100644 --- a/emacs/notmuch-lib.el +++ b/emacs/notmuch-lib.el @@ -469,6 +469,60 @@ be displayed." "[No Subject]" subject))) + +(defun notmuch-balance-bidi-ctrl-chars (string) + "Balance bidirectional control chars in STRING. + +The following Unicode's bidirectional control chars are modal so +that they push a new bidirectional rendering mode to a stack: +U+202A LEFT-TO-RIGHT EMBEDDING, U+202B RIGHT-TO-LEFT EMBEDDING, +U+202D LEFT-TO-RIGHT OVERRIDE and U+202E RIGHT-TO-LEFT OVERRIDE. +Every mode must be terminated with with character U+202C POP +DIRECTIONAL FORMATTING which pops the mode from the stack. The +stack is per paragraph. A new text paragraph resets the rendering +mode changed by these control characters. + +This function reads the STRING argument and ensures that all push +characters (U+202A, U+202B, U+202D, U+202E) have a pop character +pair (U+202C). The function may add more U+202C characters at the +end of the returned string, or it may remove some U+202C +characters. The returned string is safe in the sense that it +won't change the surrounding bidirectional rendering mode. This +function should be used when sanitizing arbitrary input." + + (let ((new-string nil) + (stack-count 0)) + + (cl-flet ((push-char-p (c) + ;; U+202A LEFT-TO-RIGHT EMBEDDING + ;; U+202B RIGHT-TO-LEFT EMBEDDING + ;; U+202D LEFT-TO-RIGHT OVERRIDE + ;; U+202E RIGHT-TO-LEFT OVERRIDE + (cl-find c '(?\u202a ?\u202b ?\u202d ?\u202e))) + (pop-char-p (c) + ;; U+202C POP DIRECTIONAL FORMATTING + (eql c ?\u202c))) + + (cl-loop for char across string + do (cond ((push-char-p char) + (cl-incf stack-count) + (push char new-string)) + ((and (pop-char-p char) + (cl-plusp stack-count)) + (cl-decf stack-count) + (push char new-string)) + ((and (pop-char-p char) + (not (cl-plusp stack-count))) + ;; The stack is empty. Ignore this pop character. + ) + (t (push char new-string))))) + + ;; Add possible missing pop characters. + (cl-loop repeat stack-count + do (push ?\x202c new-string)) + + (seq-into (nreverse new-string) 'string))) + (defun notmuch-sanitize (str) "Sanitize control character in STR. -- 2.20.1 _______________________________________________ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-leave@notmuchmail.org