Specific issue regarding MathJax, then (auto?) sav

2019-09-07 07:08发布

问题:

Look, this might be a dupe question, and apologies if it is... but honestly, everything I've found on the subject seems to be from 2007, or calls out special caveats for IE6 and the like.

The setup: Web page using math markup and MathJax to render the math in the web page (working fine).

The user(s) need to be able to export this to some sort of doc - word, PDF, etc - for distribution to proof-readers who are not permitted/desired to be "in the system" where the pages are served.

The issue: Everything I've tried thus far to get the rendered final product out to some sort of doc - OTHER than doing a user-initiated browser-print - shows the unrendered markup and not the final product.

This is obviously due to the way the MathJax library renders the page when it's fully available, in the browser, as it's just a JS script inclusion. No surprises there.

I can get close by doing an ajax call to a page that renders, and sending that whole blob of html to a third page to write out to disc and re-serve it with mime and content disposition headers for msword, saving it to disc, etc., but the rendering is not correct - presumably due to packaging it up in a POST call. And that's a lot of steps to end up with a not-quite-right solution, anyway.

I'm guessing the answer is going to be "you can't do that", at least not without using one of the HUGE installs of TeX Live or MikTex, etc., and doing it in the back end with shell calls... but I don't have the ability to install on these hosts anyway.

Am I stuck with users doing a print-to-PDF solution? Is there something I'm missing?

Thanks, happy to flesh out where needed, but I can't be the first trying to do this.

回答1:

For PDF there are a couple of options and it mostly depends on how much work you want to put in.

The quick and dirty solution might be wkhtmltopdf, but you'll have to specify a wait time for JavaScript rendering to finish -- not ideal.

PhantomJS requires slightly more work but allows you to listen in on the page, e.g., this discussion links to a simple example. (There are lots of PhantomJS-based tools out there actually.)

Another way would be to first pre-process using MathJax-node and then pass the result to wkhtmltopdf (then you don't have to wait for MathJax).

For doc/docx I don't think there is any way right now. The natural way would be to use MathJax-node to generate MathML, since Word can import MathML. But Word does not seem to support MathML when imported from HTML. The same holds for generating SVG with MathJax-node (but with SVG you would loose the ability to edit the equations so that might be prohibitive anyway).

Pandoc might eventually help. It can apparently convert mathematics to the MS Office format, see demo #30). But from a quick test this doesn't seem to work for HTML input right now.



回答2:

If you are considering commercial solutions, have a look at pdfChip from callas software (warning: I'm heavily affiliated with this solution).

It does HTML to PDF and will actually convert MathML using MathJax into a proper PDF file (that can even be a PDF/X or PDF/A file if you so desire). I'll be happy to provide more details off-line.