I ve been searching for a while and all that i ve seen some OCR library requests. I would like to know how to implement the purest, easy to install and use OCR library with detailed info for installation into a C# project.
If posible, I just wanna implement it like a usual dll reference...
Example:
using org.pdfbox.pdmodel;
using org.pdfbox.util;
Also a little OCR code example would be nice, such as:
public string OCRFromBitmap(Bitmap Bmp)
{
Bmp.Save(temppath, System.Drawing.Imaging.ImageFormat.Tiff);
string OcrResult = Analyze(temppath);
File.Delete(temppath);
return OcrResult;
}
So please consider that I'm not familiar to OCR projects and give me an answer like talking to a dummy.
Edit: I guess people misunderstood my request. I wanted to know how to implement those open source OCR libraries to a C# project and how to use them. The link given as dup is not giving answers that I requested at all.
There is a .NET wrapper for Tesseract 3.01: https://github.com/charlesw/tesseract-ocr-dotnet
Another Option to this is to use Neevia Document Converter which has inbuilt OCR capability. You can run pretty much any file type and it will product a pdf that is essential a big text document, which you can then open and search through using ITextSharper
If anyone is looking into this, I've been trying different options and the following approach yields very good results. The following are the steps to get a working example:
Install-Package Tesseract
(https://github.com/charlesw/tesseract).tesseract-ocr-3.02.eng.tar.gz English language data for Tesseract 3.02
.tessdata
directory in your project and place the language data files in it.Properties
of the newly added files and set them to copy on build.System.Drawing
.Samples
directory copy the samplephototest.tif
file into your project directory and set it to copy on build.Program.cs
FormattedConsoleLogger.cs
Here's one: (check out http://hongouru.blogspot.ie/2011/09/c-ocr-optical-character-recognition.html or http://www.codeproject.com/Articles/41709/How-To-Use-Office-2007-OCR-Using-C for more info)
I'm using tesseract OCR engine with TessNet2 (a C# wrapper - http://www.pixel-technology.com/freeware/tessnet2/).
Some basic code:
...
Some online API's work pretty well: ocr.space and Google Cloud Vision. Both of these are free, as long as you do less than 1000 OCR's per month. You can drag & drop an image to do a quick manual test to see how they perform for your images.
I find OCR.space easier to use (no messing around with nuget libraries), but, for my purpose, Google Cloud Vision provided slightly better results than OCR.space.
Google Cloud Vision example:
OCR.space example: