I was looking around for an OCR library - optimally it would be open-source - that I could use on some Arabic pdfs. Googling it didn't result in anything useful. I was wondering if anyone knows a related OCR library or even one that works on related languages (Farsi and Urdu could be relevant) that Arabic support could be added to.
Any general suggestions on how to approach this will be appreciated.
I know nothing about Arabic OCR quality, but some intelligent Googling found Sakhr's Automatic Reader. It's commercial software.
Sorry. It's commercial, and quite expensive. Arabic is probably one of the hardest languages in the world to do OCR on -- I guess it takes a lot to motivate someone to do it.
The Arabic language is sophisticated when it comes to OCR because of the nature of the language and there is no free or commercial software that can get 100% accuracy.
This is from my personal experience but you can try IRISREadIRIS pro 14.
Starting with Version 3.01 of Tessaract-ocr it now supports Arabic