[PATCH 1/2] Emacs: Add a new function for balancing bidi control chars

Subject: [PATCH 1/2] Emacs: Add a new function for balancing bidi control chars

Date: Sat, 15 Aug 2020 12:30:35 +0300

To: notmuch@notmuchmail.org

Cc: tomi.ollila@iki.fi

From: Teemu Likonen


The following Unicode's bidirectional control chars are modal so that
they push a new bidirectional rendering mode to a stack:

    U+202A LEFT-TO-RIGHT EMBEDDING
    U+202B RIGHT-TO-LEFT EMBEDDING
    U+202D LEFT-TO-RIGHT OVERRIDE
    U+202E RIGHT-TO-LEFT OVERRIDE

Every mode must be terminated with with character U+202C POP
DIRECTIONAL FORMATTING which pops the mode from the stack. The stack
is per paragraph. A new text paragraph resets the rendering mode
changed by these control characters.

This change adds a new function "notmuch-balance-bidi-ctrl-chars"
which reads its STRING argument and ensures that all push
characters (U+202A, U+202B, U+202D, U+202E) have a pop character
pair (U+202C). The function may add more U+202C characters at the end
of the returned string, or it may remove some U+202C characters. The
returned string is safe in the sense that it won't change the
surrounding bidirectional rendering mode. This function should be used
when sanitizing arbitrary input.
---
 emacs/notmuch-lib.el | 54 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 118faf1e..e6252c6c 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -469,6 +469,60 @@ be displayed."
 	"[No Subject]"
       subject)))
 
+
+(defun notmuch-balance-bidi-ctrl-chars (string)
+  "Balance bidirectional control chars in STRING.
+
+The following Unicode's bidirectional control chars are modal so
+that they push a new bidirectional rendering mode to a stack:
+U+202A LEFT-TO-RIGHT EMBEDDING, U+202B RIGHT-TO-LEFT EMBEDDING,
+U+202D LEFT-TO-RIGHT OVERRIDE and U+202E RIGHT-TO-LEFT OVERRIDE.
+Every mode must be terminated with with character U+202C POP
+DIRECTIONAL FORMATTING which pops the mode from the stack. The
+stack is per paragraph. A new text paragraph resets the rendering
+mode changed by these control characters.
+
+This function reads the STRING argument and ensures that all push
+characters (U+202A, U+202B, U+202D, U+202E) have a pop character
+pair (U+202C). The function may add more U+202C characters at the
+end of the returned string, or it may remove some U+202C
+characters. The returned string is safe in the sense that it
+won't change the surrounding bidirectional rendering mode. This
+function should be used when sanitizing arbitrary input."
+
+  (let ((new-string nil)
+	(stack-count 0))
+
+    (cl-flet ((push-char-p (c)
+		;; U+202A LEFT-TO-RIGHT EMBEDDING
+		;; U+202B RIGHT-TO-LEFT EMBEDDING
+		;; U+202D LEFT-TO-RIGHT OVERRIDE
+		;; U+202E RIGHT-TO-LEFT OVERRIDE
+		(cl-find c '(?\u202a ?\u202b ?\u202d ?\u202e)))
+	      (pop-char-p (c)
+		;; U+202C POP DIRECTIONAL FORMATTING
+		(eql c ?\u202c)))
+
+      (cl-loop for char across string
+	       do (cond ((push-char-p char)
+			 (cl-incf stack-count)
+			 (push char new-string))
+			((and (pop-char-p char)
+			      (cl-plusp stack-count))
+			 (cl-decf stack-count)
+			 (push char new-string))
+			((and (pop-char-p char)
+			      (not (cl-plusp stack-count)))
+			 ;; The stack is empty. Ignore this pop character.
+			 )
+			(t (push char new-string)))))
+
+    ;; Add possible missing pop characters.
+    (cl-loop repeat stack-count
+	     do (push ?\x202c new-string))
+
+    (seq-into (nreverse new-string) 'string)))
+
 (defun notmuch-sanitize (str)
   "Sanitize control character in STR.
 
-- 
2.20.1
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Thread: