Re: converting attachments to text

Subject: Re: converting attachments to text

Date: Tue, 03 Jan 2017 12:23:34 -0500

To: Bart Bunting,


From: Brian Sniffen

Sure!  Here's what I use for docx, and I think it could be adapted to
pdf with pdftotext or whatever you're already using there.  You need a
small shell script that reads from STDIN, writes to a file, and calls
pandoc or pdftotext or whatever, like ~/bin/antiwordx:


    tmpfile=$(mktemp /tmp/antiwordx.XXXXXX.docx)
    trap 'rm -f -- "$tmpfile"' INT TERM HUP EXIT
    cat > "$tmpfile"
    pandoc --normalize -r docx -w markdown "$tmpfile"

You need a small handler function to call it from Elisp---see attached
file `inline-docx.el`, which assumed you have both the old `antiword`
for old-style .doc files and pandoc for new-style `docx`.

I apologize for the roughness of the code; it should probably use
customizable paths for pandoc and such.


inline-docx.el (application/emacs-lisp)

Bart Bunting <> writes:

> Hi,
> Just looking for some pointers.
> I have to deal with quite a few emails with attachments in either pdf or
> word format.
> I'm on a mac so can use applescript or something pdftotext or similar to
> convert them to text.
> I'm blind so use emacspeak as my primary interface.  Having an easy way
> to convert the notmuch attachments to text other than saving to a file
> and processing them would greatly speed up my workflow.
> Is there something in existance already to do this sort of thing?
> I have a little rudimentary lisp skill so can hack something up if
> someone can give me some pointers on a direction to head in.
> Any advice appreciated.
> Kind regards
> Bart
> Kind regards
> Bart
> -- 
> Bart Bunting - URSYS
> PH: 02 87452811
> Mbl: 0409560005
> _______________________________________________
> notmuch mailing list