We are currently researching ways of enhancing image quality prior to submission to OCR. The OCR engine we are currently utilizing is the Scansoft API from Nuance (v15). We were researching the Lead Tools but have since decided to look elsewhere. The licensing costs associated with Lead Tools is just too great. To start with we are looking for simple image enhancement features such as: deskewing, despeckling, line removal, punch hole removal, sharpening, etc. We are running a mix of .NET and Java software, but java solution would be preferred.
相关问题
- Delete Messages from a Topic in Apache Kafka
- Jackson Deserialization not calling deserialize on
- How to maintain order of key-value in DataFrame sa
- How to get the background from multiple images by
- StackExchange API - Deserialize Date in JSON Respo
Disclaimer: I work for Atalasoft
We have those functions and run-time royalty-free licensing for .NET.
http://www.atalasoft.com/products/dotimage/
We also have OCR components including a .NET wrapper for Abbyy, Tesseract and others and Searchable PDF generation (image on top of text in a PDF)
Research about KOFAX VRS at KOFAX.com
Maybe JMagick, it is an open source Java interface of ImageMagick. It is implemented in the form of a thin Java Native Interface (JNI) layer into the ImageMagick API. It's licensed under the LGPL so it shouldn't be a problem license wise.
http://sourceforge.net/projects/jmagick/
Kofax is good for pre-processing, but for the types of cleanup you are talking about may be overkill unless the images are really bad. Unless your specialty is in image processing, I'd recommend working with a provider that does the image cleanup and the OCR so you can focus on the value you actually add.
We license the OCR development kit from ABBYY (ABBY SDK) and have found it to be superb for both image processing and OCR. The API is quite extensive, and the sample apps, help and support have been beyond impressive. I definitely recommend taking a look.
I would suggest Intel for its zero-cost runtime licensing.
Not sure if this would be quite up to the standards that you guys would need, but perhaps you should look at some of the Paint.Net APIs. I don't know how easy it would be to extract their image processing algorithms for use in your project, but I believe they do some of the things you are looking for. Plus it is an open source project with an MIT License, so it should be pretty friendly for business use.