OCR lib for math formulas

2019-01-21 02:49发布

I need an open OCR library which is able to scan complex printed math formulas (for example some formulas which were generated via LaTeX). I want to get some LaTeX-like output (or just some AST-like data).

Is there something like this already? Or are current OCR technics just able to parse line-oriented text?

(Note that I also posted this question on Metaoptimize because some people there might have additional knowledge.)

The problem was also described by OpenAI as im2latex.

标签: ocr
9条回答
Animai°情兽
2楼-- · 2019-01-21 03:19

InftyReader is the only one I'm aware of. It is NOT free software (it seems the money goes to a non-profit org, IIRC).

http://www.sciaccess.net/en/InftyReader/

I don't know why PDF can't have metadata in LaTeX? As in: put the LaTeX equation in it! Is this so hard? (I dunno anything about PDF syntax, but I imagine it can be done).

LaTeX syntax is THE ONE TRIED AND TRUE STANDARD for mathematics notation. It seems amazingly stupid that folks that produced MathML and other stuff don't take this in consideration. InftyReader generates MathML or LaTeX syntax.

If I want HTML (pure) I then use TTH to read the LaTeX syntax. Just works.

ABBYY FineReader (a great OCR program) claims you can train the software for Math, but this is immensely braindead (who has the time?)

And Unicode has lots of math symbols. That today's OCR readers can't grok them shows the sorry state of software and the brain deficit in this activity.

As to "one symbol at a time", TeX obviously has rules as to where it will place symbols. They can't write software that know those rules?! TeX is even public domain! They can just "use it" in their comercial products.

查看更多
We Are One
3楼-- · 2019-01-21 03:22

Check out "Web Equation." It can convert handwritten equations to LaTeX, MathML, or SymbolTree. I'm not sure if the engine is open source.

查看更多
姐就是有狂的资本
4楼-- · 2019-01-21 03:30

You know, there's an application in Win7 just for that: Math Input Panel. It even handles handwritten input (it's actually made for this). Give it a shot if you have Win7, it's free!

查看更多
登录 后发表回答