Re: converting attachments to text

Subject: Re: converting attachments to text

Date: Tue, 03 Jan 2017 12:23:34 -0500

To: Bart Bunting, notmuch@notmuchmail.org

Cc:

From: Brian Sniffen


Sure!  Here's what I use for docx, and I think it could be adapted to
pdf with pdftotext or whatever you're already using there.  You need a
small shell script that reads from STDIN, writes to a file, and calls
pandoc or pdftotext or whatever, like ~/bin/antiwordx:

    #!/bin/sh

    tmpfile=$(mktemp /tmp/antiwordx.XXXXXX.docx)
    trap 'rm -f -- "$tmpfile"' INT TERM HUP EXIT
    cat > "$tmpfile"
    pandoc --normalize -r docx -w markdown "$tmpfile"

You need a small handler function to call it from Elisp---see attached
file `inline-docx.el`, which assumed you have both the old `antiword`
for old-style .doc files and pandoc for new-style `docx`.

I apologize for the roughness of the code; it should probably use
customizable paths for pandoc and such.

-Brian

inline-docx.el (application/emacs-lisp)

Bart Bunting <bart.bunting@ursys.com.au> writes:

> Hi,
>
> Just looking for some pointers.
>
> I have to deal with quite a few emails with attachments in either pdf or
> word format.
>
> I'm on a mac so can use applescript or something pdftotext or similar to
> convert them to text.
>
> I'm blind so use emacspeak as my primary interface.  Having an easy way
> to convert the notmuch attachments to text other than saving to a file
> and processing them would greatly speed up my workflow.
>
> Is there something in existance already to do this sort of thing?
>
> I have a little rudimentary lisp skill so can hack something up if
> someone can give me some pointers on a direction to head in.
>
> Any advice appreciated.
>
> Kind regards
>
> Bart
>
> Kind regards
> Bart
> -- 
>
> Bart Bunting - URSYS
> PH: 02 87452811
> Mbl: 0409560005
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch

Thread: