Software to Improve OCR Results Based on Output fr

2020-03-31 03:47发布

Is there an already-existing piece of commercial or academic software that can

  • overlay results from multiple OCR packages (Abbyy FineReader, Adobe Acrobat Professional, ReadIris, etc.)
  • provide fully automated improvements based on accumulated knowledge from multiple sources
  • allow for use of additional external tools setup at runtime (dictionieres, batch web / local corpus look-ups etc.)

?

Note: I already have in-house solutions to visualize results from single sources, so in case there is no such software obtainable, I would not mind developing my own : ) Inquiries for cooperation would then also be most welcome! screnshot
(source: sourceforge.net)

2条回答
太酷不给撩
2楼-- · 2020-03-31 04:11

The idea to use voting between several OCR engines is not new. The thing is that it is not really working. What probably would work if they would be simple classifiers ortogonal by thier nature, then you would combine their votes and improve results. But they all are very complicated software, using quite similar set of well-known approches with little variances, but probably combining them different way and some implementations are better and some are worse.

Experience shows that when you combine several OCR technologies, the best decision rule is to rely on results of most accurate one and just ingore others. From my experience (I work for ABBYY), ABBYY OCR is definetely the most accurate from ones you mentioned.

As far as I know, the only reason to use voting is when you want cross-check "suspicious" characters and send them to manual verification if 100% accuracy is a requirement. Using this approach you increase number of characters to verify, but reduce possibility to miss wrong character.

查看更多
混吃等死
3楼-- · 2020-03-31 04:31

There are two options that I have worked with previously and would recommend.

  1. PrimeOCR. http://www.primerecognition.com/

It is a commercial offering that uses multiple OCR engines and voting to determine the best result. It is machine print only. Last time I used it they had 6 engines. Contact Alex Dahl.

I have used it in a major project scanning 20,000+ pages per day.

  1. RecoStar from OpenText.

RecoStar uses voting and can do handprint and machineprint.

查看更多
登录 后发表回答