As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened,
visit the help center for guidance.
Closed 7 years ago.
I'm looking to process a bunch of scanned response postcards that have handwritten contact information on them (ie Name, Address, Phone, Email, etc).
I'm curious if there is a viable open-source library or piece of software to do this (ideally Java or R). In looking around a lot of the information is from 2009 or early and isn't very encouraging.
The language is English.
Any suggestions?
EDIT: I've looked at the OCRopus page but the latest version is from May 2009. Anyone have any experience with this or is there a more recent version?
To begin with, as far as i know there are no native opensource Java OCR SDKs. There are Java APIs which wrap calls for native interfaces, tesjeract (http://code.google.com/p/tesjeract/) or Tess4J (http://tess4j.sf.net/).
Next, you need to specify whether you look for handwritten or handprinted text. If you need handwriting text recognition - i don't beleive you'll be able to solve your tasks because of the reasons stated in other answers.
However, if you need ICR (that stands for intelligent character recognition) for handprinted text (rather clear letters used in surveys, forms, etc.) there could be a solution. While I beleive that tesseract (despite being considered the best among opensource engines) won't do the job for you here, you can look for more accurate SDKs.
Maybe this question would help: Handwritten scanned Doc to .txt File?
I am not aware about any working open source Handwriting recognition library, regardless I am in the OCR space for a while already. Typically handwriting is more difficult than OCR and I would say that there is no even decent commercial solution. All that exist have their own issues and can only work in very narrow applications like when dictionary is limited, text is well-written, etc. If you still interested I would recommend checking technology from french company I2IA
You may want to look at http://code.google.com/p/ocropus/, which is an open-source OCR system.
But, it appears to be written in C++ and python.
*UPDATE: *
Since one of the research projects is a handwritten analyzer I expect it may help.
The OCRopus engine is based on two research projects: a
high-performance handwriting recognizer developed in the mid-90's and
deployed by the US Census bureau, and novel high-performance layout
analysis methods.
And if you look at http://code.google.com/p/ocropus/source/browse/ the source files have been updated since 10/2011 (one of the three was from 3/2012), so it appears to be currently under development still.