I have an idea for a CMS enhancement, to extract text information from images (for example, scanned documents), and want to know if there is already anything out there to help me along?
Basically, I want to know if there is an existing OCR script written in JavaScript that can extract sentences/words from an image (using canvas
, for example).
I know there are some scripts that do relatively small tasks such as captcha-cracking, but I haven't yet come across a script for extracting full sentences.
Is there such a thing, or would I need to write it from scratch?
Antimatter15's Ocrad.js is a possibility
Take a look at https://github.com/selead/node-ocr. It's a CoffeeScript libray to access ABBYY Cloud OCR SDK service.
There is a tesseract module for node.js available on github.