I'm working on an OCR project for iPhone using tesseract OCR engine. I'm planning to write the following modules:
- Capture image from iPhone camera
- Pre-process on the image to refine it, in order to improve the OCR output.
- Divide the OCR output into meaningful fields.
- Define some rules for the OCR engine in order to neglect any undefined characters.
(e.g. if the the OCR output is
0226s5242
I want it to ignore thes
character)
I want to begin learning the topics related to these modules, I'm not aware of the OCR related techniques, so any advice will be very helpful, Thanks.