Markdown to docx, including complex template

2019-01-29 21:12发布

问题:

I have automated my build to convert Markdown files to DOCX files using Pandoc. I have even used a reference document for the final document's styling. The command I use is:

pandoc -f markdown -t docx --data-dir=docs/rendering/ mydoc.md -o mydoc.docx

The reference.docx is picked up by Pandoc from docs/rendering and Pandoc renders mydoc.docx with the same styles as the reference doc.

However, reference.docx contains more than just styles. It contains coporate logos, preamble, etc.

How can I automate the merging of the Markdown content with both the styles and content of reference.docx. My solution needs to work on Linux.

回答1:

Update

Use the piped version suggested by user Christian Long:

pandoc -t latex mydoc.md | pandoc -f latex --data-dir=docs/rendering/ -o mydoc.docx

I know this is late in coming, but I'll be assuming people are still searching for solutions to this three years after the original question -- I know I was.

My solution was to use LaTeX as an intermediary between markdown and docx (actually, I was converting from org-mode, but same difference). So in your case, I believe a one-liner solution would be:

pandoc -f markdown -t latex -o mydoc.tex mydoc.md && \
pandoc -f latex -t docx --data-dir=docs/rendering/ -o mydoc.docx mydoc.tex

Which might get you closer to your goal. Of course, Pandoc has about hundred arguments it can handle, and there are probably ways to make this prettier. It has also gotten quite a few updates since you first posted your question.



回答2:

Ideally, PanDoc will grow this feature but it doesn't look like likely any time soon.

I don't know about any tools that will do the job directly, but you could probably achieve fall back to merging reference.docx and your PanDoc-produced mydoc.docx in code.

The .docx format is a ZIP archive of (mostly) XML files. The most important is word/document.xml. If you use an XML tool to take (most of) the document.xml from one file and insert it into the other, you'll have something closer to what you need.

I could hack together an example in, say, Ruby if an illustration would help.



回答3:

Ideally you could use a custom docx template, but pandoc doesn't support that yet. A reference.docx file only allows custom styles to be embedded in newly created docx files.

Fortunately you can approximate this using odt instead of docx. You can fairly easily modify the default OpenDocument template to include your custom logos, preamble, and other stuff. Use the custom template in conjunction with a reference.odt file to get all the styles and custom content.

Once you have the file in odt format, you can use any number of command line tools to convert from odt to docx. For example, on Linux you can run

libreoffice --invisible --convert-to docx test.odt

Or on OS X:

/Applications/LibreOffice.app/Contents/MacOS/soffice.bin --invisible --convert-to docx test.odt


回答4:

UPDATE: this feature is incomplete

I used it on some complex templates, and found it mapped the fonts, company logos, etc very well. But going .docx -> .docx, I had to manually apply Heading styles to the chapter / section breaks. The font was correct, but the sectioning wasn't. I'll try .md -> .docx next.


This feature is now available in Pandoc, as described here:

Markdown to docx, including complex template

From the link above:

pandoc input --reference-docx=my-reference.docx -o out.docx

where my-reference.docx (n.b. not a .dotx) can be:

  • the current folder OR
  • a folder which is defined by --data-dir OR
  • the system default folder for data-dir which is
    • $HOME/.pandoc on UNIX-like systems
    • C:\Documents And Settings\USERNAME\Application Data\pandoc on Windows XP you should not use any more
    • C:\Users\USERNAME\AppData\Roaming\pandoc on Windows Vista or later.