How to use Python filter with Pandoc to convert md

2019-04-21 09:03发布

问题:

I am trying to use a Pandoc filter to convert a markdown file with a tikz picture to html. I am on Win 8.1 (and I have all the dependencies -- pdflatex, Python 2.7, ImageMagick, and the pandocfilters Python package). I am using the tikz.py script that John MacFarlane provides on github.

I found a similar question on the Pandoc Google Group and John MacFarlane suggests wrapping the filter in a Windows batch script (the filter must be an executable). Here is my command line input (I'll provide the file contents below).

pandoc -o temp.html --filter .\tikz.bat -s temp.md

But I keep getting the following error.

pandoc: Failed reading: satisfyElem

The script generates the "tikz-images" subfolder, but it is empty, as is the resulting output file temp.html.

How can I get this to work? FWIW, the bigger goal is for the input files to be R Markdown, but I want to understand the Pandoc Markdown to HTML process first.

Here are the file contents.

tikz.bat

python tikz.py %*

temp.md

\begin{tikzpicture}

\draw [<->](-3,0)--(3,0);
\draw (-2,-.2)--(-2,.2);
\draw (-1,-.2)--(-1,.2);
\draw(0,-.2)--(0,.2);
\draw (1,-.2)--(1,.2);
\draw (2,-.2)--(2,.2);
\node[align=left,below] at (-4.5,-0.2) {Cash flow};
\node[align=left,above] at (-4.5,0.2) {Time period};
\node[align=left,above] at (-2,0.2) {-2};
\node[align=left,above] at (-1,0.2) {-1};
\node[align=left,above] at (0,0.2) {0};
\node[align=left,above] at (1,0.2) {+1};
\node[align=left,above] at (2,0.2) {+2};
\node[align=left,below] at (1,-0.2) {\$100};
\node[align=left,below] at (2,-0.2) {\$100};

\end{tikzpicture}

Can this work?

tikz.py

#!/usr/bin/env python

"""
Pandoc filter to process raw latex tikz environments into images.
Assumes that pdflatex is in the path, and that the standalone
package is available.  Also assumes that ImageMagick's convert
is in the path. Images are put in the tikz-images directory.
"""

import hashlib
import re
import os
import sys
import shutil
from pandocfilters import toJSONFilter, Para, Image
from subprocess import Popen, PIPE, call
from tempfile import mkdtemp

imagedir = "tikz-images"


def sha1(x):
    return hashlib.sha1(x.encode(sys.getfilesystemencoding())).hexdigest()


def tikz2image(tikz, filetype, outfile):
    tmpdir = mkdtemp()
    olddir = os.getcwd()
    os.chdir(tmpdir)
    f = open('tikz.tex', 'w')
    f.write("""\\documentclass{standalone}
             \\usepackage{tikz}
             \\begin{document}
             """)
    f.write(tikz)
    f.write("\n\\end{document}\n")
    f.close()
    p = call(["pdflatex", 'tikz.tex'], stdout=sys.stderr)
    os.chdir(olddir)
    if filetype == 'pdf':
        shutil.copyfile(tmpdir + '/tikz.pdf', outfile + '.pdf')
    else:
        call(["convert", tmpdir + '/tikz.pdf', outfile + '.' + filetype])
    shutil.rmtree(tmpdir)


def tikz(key, value, format, meta):
    if key == 'RawBlock':
        [fmt, code] = value
        if fmt == "latex" and re.match("\\\\begin{tikzpicture}", code):
            outfile = imagedir + '/' + sha1(code)
            if format == "html":
                filetype = "png"
            elif format == "latex":
                filetype = "pdf"
            else:
                filetype = "png"
            src = outfile + '.' + filetype
            if not os.path.isfile(src):
                try:
                    os.mkdir(imagedir)
                    sys.stderr.write('Created directory ' + imagedir + '\n')
                except OSError:
                    pass
                tikz2image(code, filetype, outfile)
                sys.stderr.write('Created image ' + src + '\n')
            return Para([Image([], [src, ""])])

if __name__ == "__main__":
    toJSONFilter(tikz)

Update I mention in the comments that the caps.py filter also fails with the same symptoms. Maybe I should also add the symptoms from python caps.py temp.md, which is invoking the filter outside of pandoc. My understanding is that this should print the caps.py file to the screen in all caps.

However, when I run python caps.py temp.md from the Windows command prompt it hangs. I kill the command with CTRL-C, then I get the following.

C:\Users\Richard\Desktop\temp>python caps.py temp.md
Traceback (most recent call last):
  File "caps.py", line 15, in <module>
    toJSONFilter(caps)

The same occurs with python tikz.py temp.md. A hang, followed by:

C:\Users\Richard\Desktop\temp>python tikz.py temp.md
Traceback (most recent call last):
  File "tikz.py", line 70, in <module>
    toJSONFilter(tikz)

Update 2 I tried to run the Windows debugger on the command prompt, but I'm not sure that it worked. Sometime the command prompt would hang. And it seems like the debugger hangs, too. Here is the output from the debugger.

*** wait with pending attach
Symbol search path is: *** Invalid ***
****************************************************************************
* Symbol loading may be unreliable without a symbol search path.           *
* Use .symfix to have the debugger choose a symbol path.                   *
* After setting your symbol path, use .reload to refresh symbol locations. *
****************************************************************************
Executable search path is: 
ModLoad: 00007ff7`0d920000 00007ff7`0d97d000   C:\windows\system32\cmd.exe
ModLoad: 00007fff`b7c20000 00007fff`b7dcc000   C:\windows\SYSTEM32\ntdll.dll
ModLoad: 00007fff`b5c90000 00007fff`b5dce000   C:\windows\system32\KERNEL32.DLL
ModLoad: 00007fff`b4e40000 00007fff`b4f55000   C:\windows\system32\KERNELBASE.dll
ModLoad: 00007fff`b7b70000 00007fff`b7c1a000   C:\windows\system32\msvcrt.dll
ModLoad: 00007fff`b3070000 00007fff`b307e000   C:\windows\SYSTEM32\winbrand.dll
(1c7c.29a0): Break instruction exception - code 80000003 (first chance)
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\windows\SYSTEM32\ntdll.dll - 
ntdll!DbgBreakPoint:
00007fff`b7cb2cf0 cc              int     3

Update 3 Here are the files in a Dropbox folder. This folder has the same files that I pasted above, plus the caps.py file which is direct from the Pandoc filters github repo.

回答1:

The -t option is used followed by a format not a file with the extension for example pandoc -f json -t markdown will output that markdown, -t html will output html etcetera to capture the output use a redirection operator operation > file.some_extension. But your output is going to the console. So the correct syntax is literally pandoc -f json -t markdown.

Also the pandoc documentation. If you run into problems try to modify your lune from:pandoc -o temp.html --filter .\tikz.bat -s temp.md ==> pandoc -t json | ./caps.py latex | pandoc -f json -t html.

This is how it works.

                 source format = input_file.html
                      ↓
                   (pandoc) = pandoc -t json input_file.html
                      ↓
              JSON-formatted AST 
                      ↓
                   (filter)    = python $HOME/Downloads/pandocfilters-1.2.4/examples/caps.py
                      ↓
              JSON-formatted AST
                      ↓
                   (pandoc)    =  pandoc -f json -t markdown
                      ↓
                target format = output_file.md

Separate the commands to examine output and use a pipe | to redirect output:

 pandoc -t json ~/testing/testing.html | python examples/caps.py | pandoc -f json -t markdown > output_file.md

No need to install pandocfilters download the tar file, run tar -xvf file.x.y.z or use any other application of choice and refer to the examples calling python dir/to/script.py then pipe the out put to pandoc again and redireect output to desired file format. Here is line by line:

 $pandoc -t json ~/testing/testing.html
[{"unMeta":{"viewport":{"t":"MetaInlines","c":[{"t":"Str","c":"width=device-width,"},{"t":"Space","c":[]},{"t":"Str","c":"initial-scale=1"}]},"title":{"t":"MetaInlines","c":[]},"description":{"t":"MetaInlines","c":[]}}},[{"t":"Para","c":[{"t":"Str","c":"Hello"},{"t":"Space","c":[]},{"t":"Str","c":"world!"},{"t":"Space","c":[]},{"t":"Str","c":"This"},{"t":"Space","c":[]},{"t":"Str","c":"is"},{"t":"Space","c":[]},{"t":"Str","c":"HTML5"},{"t":"Space","c":[]},{"t":"Str","c":"Boilerplate."}]},{"t":"Para","c":[{"t":"Str","c":"l"}]}]]

then:

$pandoc -t json ~/testing/testing.html | python examples/caps.py 
[{"unMeta": {"description": {"c": [], "t": "MetaInlines"}, "viewport": {"c": [{"c": "WIDTH=DEVICE-WIDTH,", "t": "Str"}, {"c": [], "t": "Space"}, {"c": "INITIAL-SCALE=1", "t": "Str"}], "t": "MetaInlines"}, "title": {"c": [], "t": "MetaInlines"}}}, [{"c": [{"c": "HELLO", "t": "Str"}, {"c": [], "t": "Space"}, {"c": "WORLD!", "t": "Str"}, {"c": [], "t": "Space"}, {"c": "THIS", "t": "Str"}, {"c": [], "t": "Space"}, {"c": "IS", "t": "Str"}, {"c": [], "t": "Space"}, {"c": "HTML5", "t": "Str"}, {"c": [], "t": "Space"}, {"c": "BOILERPLATE.", "t": "Str"}], "t": "Para"}, {"c": [{"c": "L", "t": "Str"}], "t": "Para"}]]

finally:

pandoc -t json ~/testing/testing.html | python examples/caps.py | pandoc -f json -t markdown
HELLO WORLD! THIS IS HTML5 BOILERPLATE.

notes:

diff -y pandoc_json.txt caps_json.txt
[{"unMeta":{"viewport":{"t":"MetaInlines","c":[{"t":"Str","c" / [{"unMeta": {"description": {"c": [], "t": "MetaInlines"}, "v