Is there a viable handwriting recognition library

2020-03-19 02:03发布

I'm looking to process a bunch of scanned response postcards that have handwritten contact information on them (ie Name, Address, Phone, Email, etc).

I'm curious if there is a viable open-source library or piece of software to do this (ideally Java or R). In looking around a lot of the information is from 2009 or early and isn't very encouraging.

The language is English.

Any suggestions?

EDIT: I've looked at the OCRopus page but the latest version is from May 2009. Anyone have any experience with this or is there a more recent version?

3条回答
何必那么认真
2楼-- · 2020-03-19 02:23

You may want to look at http://code.google.com/p/ocropus/, which is an open-source OCR system.

But, it appears to be written in C++ and python.

*UPDATE: *

Since one of the research projects is a handwritten analyzer I expect it may help.

The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods.

And if you look at http://code.google.com/p/ocropus/source/browse/ the source files have been updated since 10/2011 (one of the three was from 3/2012), so it appears to be currently under development still.

查看更多
地球回转人心会变
3楼-- · 2020-03-19 02:24

I am not aware about any working open source Handwriting recognition library, regardless I am in the OCR space for a while already. Typically handwriting is more difficult than OCR and I would say that there is no even decent commercial solution. All that exist have their own issues and can only work in very narrow applications like when dictionary is limited, text is well-written, etc. If you still interested I would recommend checking technology from french company I2IA

查看更多
做自己的国王
4楼-- · 2020-03-19 02:25

To begin with, as far as i know there are no native opensource Java OCR SDKs. There are Java APIs which wrap calls for native interfaces, tesjeract (http://code.google.com/p/tesjeract/) or Tess4J (http://tess4j.sf.net/).

Next, you need to specify whether you look for handwritten or handprinted text. If you need handwriting text recognition - i don't beleive you'll be able to solve your tasks because of the reasons stated in other answers.

However, if you need ICR (that stands for intelligent character recognition) for handprinted text (rather clear letters used in surveys, forms, etc.) there could be a solution. While I beleive that tesseract (despite being considered the best among opensource engines) won't do the job for you here, you can look for more accurate SDKs.

Maybe this question would help: Handwritten scanned Doc to .txt File?

查看更多
登录 后发表回答