Sanitize bidi control chars

Subject: Sanitize bidi control chars

Date: Mon, 10 Aug 2020 21:27:59 +0300

To: tomi.ollila@iki.fi, notmuch@notmuchmail.org

Cc:

From: Teemu Likonen


* 2020-08-10 19:45:11+03, Teemu Likonen wrote:

> If we wanted to clean message headers from possible unpaired overrides
> we should clean all these:
>
>     U+202A LEFT-TO-RIGHT EMBEDDING (push)
>     U+202B RIGHT-TO-LEFT EMBEDDING (push)
>     U+202C POP DIRECTIONAL FORMATTING (pop)
>     U+202D LEFT-TO-RIGHT OVERRIDE (push)
>     U+202E RIGHT-TO-LEFT OVERRIDE (push)
>
> Or we could even try to be clever and count those characters and then
> insert or remove some of them so that there are as many "push"
> characters as "pop" characters.

Below is an example Emacs Lisp function to balance those "push" and
"pop" bidi control chars. This kind of code could be used to sanitize
message headers or any arbitrary text coming from user.

I'm not even sure if such thing should be done in Emacs or in lower
level Notmuch code. Anyway, I tried to add it to notmuch-sanitize
function. Now Tomi's message didn't switch direction of other text
anymore (in notmuch-search-mode buffer).


(defun notmuch-balance-bidi-ctrl-chars (string)
  (let ((new nil)
        (stack-count 0))

    (cl-flet ((push-char-p (c)
                ;; U+202A LEFT-TO-RIGHT EMBEDDING
                ;; U+202B RIGHT-TO-LEFT EMBEDDING
                ;; U+202D LEFT-TO-RIGHT OVERRIDE
                ;; U+202E RIGHT-TO-LEFT OVERRIDE
                (cl-find c '(?\x202a ?\x202b ?\x202d ?\x202e)))
              (pop-char-p (c)
                ;; U+202C POP DIRECTIONAL FORMATTING
                (eql c ?\x202c)))

      (cl-loop
       for char across string
       do (cond ((push-char-p char)
                 (cl-incf stack-count)
                 (push char new))
                ((and (pop-char-p char)
                      (cl-plusp stack-count))
                 (cl-decf stack-count)
                 (push char new))
                ((and (pop-char-p char)
                      (not (cl-plusp stack-count)))
                 ;; The stack is empty. Ignore this pop char.
                 )
                (t (push char new)))))

    ;; Add missing pops.
    (cl-loop
     repeat stack-count
     do (push ?\x202c new))

    (seq-into (nreverse new) 'string)))



-- 
/// Teemu Likonen - .-.. http://www.iki.fi/tlikonen/
// OpenPGP: 4E1055DC84E9DFF613D78557719D69D324539450
signature.asc (application/pgp-signature)
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Thread: