Which library to use to extract text from images?

I am writing a program that when given an image of a low level math problem (e.g. 98*13) should be able to output the answer. The numbers would be black, and the background white. Not a captcha, just an image of a math problem.

The math problems would only have two numbers and one operator, and that operator would only be +, -, *, or /.

Obviously, I know how to do the calculating ;) I'm just not sure how to go about getting the text from the image.

A free library would be ideal... although If I have to write the code myself I could probably manage.

标签： c# ocr text-recognition

5条回答

倾城　Initia

2楼-- · 2019-03-30 02:26

Try this post regarding using the C++ Google Tessaract OCR lib in C#

OCR with the Tesseract interface

0人赞添加讨论(0) 举报

Viruses.

3楼-- · 2019-03-30 02:30

Here is some useful sample code for C#:

Using Tesseract: Free open-source OCR application for the Windows Desktop - A modern GUI front-end for the Tesseract OCR engine. The application also includes support for reading and OCR'ing PDF files: https://github.com/A9T9/Free-Ocr-Windows-Desktop
Using Microsoft OCR: Free open-source OCR application for the Windows Store - A modern GUI front-end for the Microsoft OCR library. The application also includes support for reading and OCR'ing PDF files: https://github.com/A9T9/Free-OCR-Software

0人赞添加讨论(0) 举报

放我归山

4楼-- · 2019-03-30 02:44

For extract words from image, I use the most accurate open source OCR engine: Tesseract. Available here or directly in your packages NuGet.

And this is my function in C#, which extract words from image passed in sourceFilePath. Set EngineMode to TesseractAndCube; it detect more word than the other options.

var path = "YourSolutionDirectoryPath";
using (var engine = new TesseractEngine(path + Path.DirectorySeparatorChar + "tessdata", "fra", EngineMode.TesseractAndCube))
{
    using (var img = Pix.LoadFromFile(sourceFilePath))
    {
        using (var page = engine.Process(img))
        {
            var text = page.GetText();
            // text variable contains a string with all words found
        }
    }
}

I hope that helps.

0人赞添加讨论(0) 举报

做自己的国王

5楼-- · 2019-03-30 02:46

You need OCR. There is the free Tesseract library from Google, but it's C code. You could use in a C++/CLI project and access via .NET.

This article gives some information on recognizing numbers (for Sudoku, but your problem is similar)

http://sudokugrab.blogspot.com/2009/07/how-does-it-all-work.html

0人赞添加讨论(0) 举报

霸刀☆藐视天下

6楼-- · 2019-03-30 02:47

you can use Microsoft Office Document Imaging (Interop.MODI.dll) in visaul studio and extract text of pictures

Document modiDocument = new Document();
modiDocument.Create(filePath);
modiDocument.OCR(MiLANGUAGES.miLANG_ENGLISH);
MODI.Image modiImage = (modiDocument.Images[0] as MODI.Image);
string extractedText = modiImage.Layout.Text;
modiDocument.Close();
return extractedText;

0人赞添加讨论(0) 举报

Which library to use to extract text from images?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间