Recover PDF to LaTeX [closed]

I know how make a PDF from LaTeX. Is there a way to extract the LaTeX-code from a PDF I created earlier? How about if someone sends me a PDF and I like the formatting. Can I extract the LaTeX from it?

标签： pdf latex file-conversion

9条回答

对你真心纯属浪费

2楼-- · 2019-01-22 13:15

Inkscape can import PDFs and then save as "LaTeX with PSTricks macros" which essentially works by embedding PostScript into the LaTeX source. It's more trouble than its worth, and the resulting Latex source has to be preprocessed before it can be output as a PDF again.

Anyway, even with some hypothetical PDF to LaTeX compiler, at best you'd get something where the position of and size each character or word is separately specified -- the opposite of what you want, which I'm guessing is for a denominator to be one half of a fraction, rather than some number below a horizontal line.

0人赞添加讨论(0) 举报

Summer. ? 凉城

3楼-- · 2019-01-22 13:17

LaTeX does not have a one-to-one conversion to PDF. With regards to your first question, I believe such a conversion may be technically possible, but I do not believe an application to do so yet exists. Similar to the way assembler can be decompiled back into high level language, there is probably a way to do it. However -- a pdf is allowed to contain all matter of kinds of data -- AutoCAD drawings, JPEG graphics, font files, forms, digital signatures, etc. LaTeX has no idea what these things are. So in answer to the second question is no -- there's not a way to extract equivalent LaTeX from any PDF document.

0人赞添加讨论(0) 举报

Animai°情兽

4楼-- · 2019-01-22 13:19

It is possible to convert your PDF to HTML and your HTML to TEX using pdftohtml and gnuhtml2latex.

In effect, you are doing PDF to LaTeX conversion in 2 steps. The result still is like "making a cow out of a hamburger", but in combination with some cleanup scripts the result can be pretty decent.

The blog post "Rudimentary PDF to LaTeX conversion in Linux" on GlobalBlindSpot has an example Bash script that converts a .pdf to a .tex file and that one to a .pdf file again.

0人赞添加讨论(0) 举报

放荡不羁爱自由

5楼-- · 2019-01-22 13:32

The best way for data mining from pdf files (due to its complicated format) is to open them with adobe illustrator. Then convert the pdf file to svg file and use a svg parser library writing some tricky code on yourself.

One efficient svg parser lib is batik

(For Linux it is quite a bit complex for converting pdf to svg: calcmaster.net/personal_projects/pdf2svg/)

PS I've been trying since a lot to find a solution to your second part of your question but I've figured out in books such "Visualizing Data, Ben Fry, O’Reilly" that pdf especially Adobe pdf is to complex to parse, so instead use a svg parser lib.

0人赞添加讨论(0) 举报

走好不送

6楼-- · 2019-01-22 13:35

Short version: No.

Long version: It's a lot like decompiling: You technically could, but it would involve lots of guessing and heuristics.

I'm not familiar with the PDF innards, but it will likely set fonts/sizes/position directly, instead of defining a format and applying it to headers and such, like in LaTeX.

0人赞添加讨论(0) 举报

Evening l夕情丶

7楼-- · 2019-01-22 13:37

There is a Tool that reads PDF-Files like an OCR and tries to recreate the Latex-Code. It's nearly perfect and called "Infty Reader"! Because Latex is quite extensible I don't think it get's all the neat formats right.

0人赞添加讨论(0) 举报

1 2 下一页

Recover PDF to LaTeX [closed]

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间