Shortly, I want to make the pre-processing procedures before OCR with the suggestion comes from ABBYY 's technology. There are two parts in article:
- Background Filtering: separate text strings from background.
- Adaptive Binarization: make lines and words will be correctly detected and higher recognition accuracy will be reached. And they try to impact on characters.
I wonder are there any ways to achieve them by using opencv
? Any suggestions or sample codes would be appreciated.
I would encourage you to use this code: http://liris.cnrs.fr/christian.wolf/software/binarize/ In particular wolf's binarization, it works really well in practice and it needs very little change to c++ code if you want to use it with opencv. Basically you have to pass the pointer to your image data to this function.
Here is a couple of papers, hope it'll be useful:
Paper from XEROX: http://www.xrce.xerox.com/content/download/6708/51560/file/Binarising-camera-images-for-OCR.pdf
And another good paper about image preprocessing for ocr: http://wbieniec.kis.p.lodz.pl/research/files/07_memstech_ocr.pdf