.NET OCRing an Image

2019-03-16 00:15发布

I'm trying to use MODI to OCR a window's program. It works fine for screenshots I grab programmatically using win32 interop like this:

public string SaveScreenShotToFile()
{
    RECT rc;
    GetWindowRect(_hWnd, out rc);

    int width = rc.right - rc.left;
    int height = rc.bottom - rc.top;

    Bitmap bmp = new Bitmap(width, height);
    Graphics gfxBmp = Graphics.FromImage(bmp);
    IntPtr hdcBitmap = gfxBmp.GetHdc();

    PrintWindow(_hWnd, hdcBitmap, 0);

    gfxBmp.ReleaseHdc(hdcBitmap);
    gfxBmp.Dispose();

    string fileName = @"c:\temp\screenshots\" + Guid.NewGuid().ToString() + ".bmp";
    bmp.Save(fileName);
    return fileName;
}

This image is then saved to a file and ran through MODI like this:

    private string GetTextFromImage(string fileName)
    {

        MODI.Document doc = new MODI.DocumentClass();
        doc.Create(fileName);
        doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
        MODI.Image img = (MODI.Image)doc.Images[0];
        MODI.Layout layout = img.Layout;

        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < layout.Words.Count; i++)
        {
            MODI.Word word = (MODI.Word)layout.Words[i];
            sb.Append(word.Text);
            sb.Append(" ");
        }

        if (sb.Length > 1)
            sb.Length--;

        return sb.ToString();
    }

This part works fine, however, I don't want to OCR the entire screenshot, just portions of it. I try cropping the image programmatically like this:

    private string SaveToCroppedImage(Bitmap original)
    {
        Bitmap result = original.Clone(new Rectangle(0, 0, 250, 250), original.PixelFormat);
        var fileName = "c:\\" + Guid.NewGuid().ToString() + ".bmp";
        result.Save(fileName, original.RawFormat);

        return fileName;
    }

and then OCRing this smaller image, however MODI throws an exception; 'OCR running error', the error code is -959967087.

Why can MODI handle the original bitmap but not the smaller version taken from it?

标签: c# .net ocr modi
7条回答
萌系小妹纸
2楼-- · 2019-03-16 00:30

I had the same issue while using the

doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);

on a tiff file that was 2400x2496. Resizing it to 50%(reducing the size) fixed the problem and the method was not throwing the exception anymore, however, it was incorrectly recognizing the text like detecting "relerence" instead of "reference" or "712017" instead of "712517". I kept trying different image sizes but they all had the same issue, until i changed the command to

doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);

which meant that i don't want it to detect the orientation and not to fix any skewing. Now the command works fine on all images including the 2400x2496 tiff.

Hope this helps out people facing the same problem

查看更多
SAY GOODBYE
3楼-- · 2019-03-16 00:32

the modi ocr is working only tif with me. try to save image in "tif".

sorry my bad english

查看更多
老娘就宠你
4楼-- · 2019-03-16 00:35

Looks as though the answer is in giving MODI a bigger canvas. I was also trying to take a screenshot of a control and OCR it and ran into the same problem. In the end I took the image of the control, copied the image into a larger bitmap and OCRed the larger bitmap.

Another issue I found was that you must have a proper extension for your image file. In other words, .tmp doesn't cut it.

I kept the work of creating a larger source inside my OCR method, which looks something like this (I deal directly with Image objects):

public static string ExtractText(this Image image)
{
    var tmpFile = Path.GetTempFileName();
    string text;
    try
    {
        var bmp = new Bitmap(Math.Max(image.Width, 1024), Math.Max(image.Height, 768));
        var gfxResize = Graphics.FromImage(bmp);
        gfxResize.DrawImage(image, new Rectangle(0, 0, image.Width, image.Height));
        bmp.Save(tmpFile + ".bmp", ImageFormat.Bmp);
        var doc = new MODI.Document();
        doc.Create(tmpFile + ".bmp");
        doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
        var img = (MODI.Image)doc.Images[0];
        var layout = img.Layout;
        text = layout.Text;
    }
    finally
    {
        File.Delete(tmpFile);
        File.Delete(tmpFile + ".bmp");
    }

    return text;
}

I'm not sure exactly what the minimum size is, but it appears as though 1024 x 768 does the trick.

查看更多
劫难
5楼-- · 2019-03-16 00:35

what solved my situation was using a photo editor (Paint.NET) and use the sharpen effect at maximum.

I also used: doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);

查看更多
虎瘦雄心在
6楼-- · 2019-03-16 00:43
doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);

Which means that I don't want it to detect the orientation and not fix any skewing. Now the command works fine on all images including the 2400x2496 tiff.

But image should be in .tif.

Hope this helps out people facing the same problem.

查看更多
放荡不羁爱自由
7楼-- · 2019-03-16 00:45

I had the same problem "OCR running problem" with some images. I re-scaled the image (in my case by 50%), i.e. reduced its size and voila! it works!

查看更多
登录 后发表回答