We are currently researching ways of enhancing image quality prior to submission to OCR. The OCR engine we are currently utilizing is the Scansoft API from Nuance (v15). We were researching the Lead Tools but have since decided to look elsewhere. The licensing costs associated with Lead Tools is just too great. To start with we are looking for simple image enhancement features such as: deskewing, despeckling, line removal, punch hole removal, sharpening, etc. We are running a mix of .NET and Java software, but java solution would be preferred.
相关问题
- Delete Messages from a Topic in Apache Kafka
- Jackson Deserialization not calling deserialize on
- How to maintain order of key-value in DataFrame sa
- How to get the background from multiple images by
- StackExchange API - Deserialize Date in JSON Respo
Depends on the number and quality of the original images. Managed code and imaging tool kits will work but it's not always the best solution if you haved several million images to process. For small batches and tight budgets, I agree with the previous posters that projects like Aforge, Paint.NET, and other open source computer vision libraries will do the trick. Of course, you are on your own if the results are not improving... At least this let's you put everything you need under one application for a low cost.
If you are processing several hundred thousand images a month, then I would suggest you divide up the process into smaller workflow step and tweak each one until your cost per image gets as close to zero as you can. You will find that the OCR results rise quickly at first and then level off sooner than you expected. (I'm not a big fan of OCR but it has its place)
I use commercial Windows product from Recogniform to process and clean up the images prior to OCR in a batch mode using scripts adjusted for various kinds of images. If an image fails QC or is rejected by the OCR engine, it is "repaired" by hand using a custom .NET application built with Atalasoft's toolkit. Batch process everything and only touch what fails.