Re: [PATCH v1 1/2] emacs: Observe the charset of MIME parts when reading them.

Subject: Re: [PATCH v1 1/2] emacs: Observe the charset of MIME parts when reading them.

Date: Mon, 02 May 2016 08:37:46 +0100

To: David Edmondson, notmuch@notmuchmail.org

Cc:

From: Mark Walters


On Sat, 30 Apr 2016, David Edmondson <dme@dme.org> wrote:
> `notmuch--get-bodypart-raw' previously assumed that all non-binary MIME
> parts could be successfully read by assuming that they were UTF-8
> encoded. This was demonstrated to be wrong, specifically when a part was
> marked as ISO8859-1 and included accented characters (which were
> incorrectly rendered as a result).
>
> Rather than assuming UTF-8, attempt to use the part's declared charset
> when reading it, falling back to US-ASCII if the declared charset is
> unknown, unsupported or invalid.

As this seemed hard to test (if I understand the bug correctly it didn't
show up on my test of the entire of the entire performance corpus -- of
course my testing could have been wrong) would it be possible to add a test
for it?

Best wishes

Mark


> ---
>  emacs/notmuch-lib.el | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
> index 78978ee..f05ded6 100644
> --- a/emacs/notmuch-lib.el
> +++ b/emacs/notmuch-lib.el
> @@ -23,6 +23,7 @@
>  
>  ;;; Code:
>  
> +(require 'mm-util)
>  (require 'mm-view)
>  (require 'mm-decode)
>  (require 'cl)
> @@ -572,7 +573,20 @@ the given type."
>  				   ,@(when process-crypto '("--decrypt"))
>  				   ,(notmuch-id-to-query (plist-get msg :id))))
>  			   (coding-system-for-read
> -			    (if binaryp 'no-conversion 'utf-8)))
> +			    (if binaryp 'no-conversion
> +			      (let ((coding-system (mm-charset-to-coding-system
> +						    (plist-get part :content-charset))))
> +				;; Sadly,
> +				;; `mm-charset-to-coding-system' seems
> +				;; to return things that are not
> +				;; considered acceptable values for
> +				;; `coding-system-for-read'.
> +				(if (coding-system-p coding-system)
> +				    coding-system
> +				  ;; RFC 2047 says that the default
> +				  ;; charset is US-ASCII. RFC6657
> +				  ;; complicates this somewhat.
> +				  'us-ascii)))))
>  		       (apply #'call-process notmuch-command nil '(t nil) nil args)
>  		       (buffer-string))))))
>      (when (and cache data)
> -- 
> 2.7.1
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch

Thread: