I have some HTML documents with MathJax equations, and I want to convert them to Latex, and then to pdf. I'd like to use Pandoc.
However, Pandoc replaces $
with \$
and it replaces \
in formulas with \textbackslash{}
.
Is it possible to get Pandoc to pass MathJax formulas literally from HTML to Latex?
It's not an easy task. Here's a solution that should work, provided you only use
$
and$$
as math delimiters, and assuming your document doesn't contain any other uses of$
. (If you can't assume that, you can try adjusting the perl regex in what follows.)Step 1: Install the Haskell Platform, if you don't have it already, and 'cabal install pandoc' to get the pandoc library. (If you installed pandoc with the binary installer, you only have the executable, not the Haskell library.)
Step 2: Now write a small Haskell script -- we'll call it fixmath.hs:
Compile this:
This will give you an executable
fixmath
. Now, assuming your input file isinput.html
, the following command should convert it to latex with the math intact, putting the result inoutput.html
:The first part is a perl one-liner that puts your math bits in special HTML comments marked "MATH". The second part parses the HTML into a JSON representation of the Pandoc data structure corresponding to the document. Then
fixmath
transforms this structure, changing the special HTML comments into raw LaTeX blocks and inlines. (See Scripting with pandoc for an explanation.) Finally we convert from JSON back to LaTeX.With the latest version of pandoc (1.12.2), you can do this:
Much nicer! If you don't want to convert math delimited by
\(
and\)
, just do