The Tesseract OCR engine isn't able to read th

I'm using a .NET wrapper for the Tesseract OCR engine. I have a large document that is a PNG. When I cut out a section of image in MS paint and then feed it into the engine, it works. But when I do it in code, the engine can't recognize the text in the image. The images look the same and the properties don't appear very off. So I'm a little confused.

Here are the two images. From MS paint:

enter image description here

From code:

enter image description here

This is what I get from the MS paint image:

enter image description here

And through code:

enter image description here

They're really similar So I'm not sure why it can't recognize the second text. The following is how I'm generating the image.

public Bitmap CropImage(Bitmap source, Rectangle section)
    {
        Bitmap bmp = new Bitmap(section.Width, section.Height);
        Graphics g = Graphics.FromImage(bmp);
        g.DrawImage(source, 0, 0, section, GraphicsUnit.Pixel);

        return bmp;
    }

    private void Form1_Load(object sender, EventArgs e)
    {
        Bitmap source = new Bitmap(test);
        Rectangle section = new Rectangle(new Point(78, 65), new Size(800, 50));
        Bitmap CroppedImage = CropImage(source, section);
        CroppedImage.Save(@"c:\users\user\desktop\test34.png", System.Drawing.Imaging.ImageFormat.Png);

        this.pictureBox1.Image = CroppedImage;
    }

标签： c# image-processing bitmap ocr tesseract

1条回答

甜甜的少女心

2楼-- · 2020-04-16 02:23

The default resolution of a new Bitmap is 96 DPI, which is not adequate for OCR purpose. Try to increase to 300 DPI, such as:

bmp.SetResolution(300, 300);

Update 1: When you scale the image, its dimension should change as well. Here's an example rescale function:

public static Image Rescale(Image image, int dpiX, int dpiY)
{
    Bitmap bm = new Bitmap((int)(image.Width * dpiX / image.HorizontalResolution), (int)(image.Height * dpiY / image.VerticalResolution));
    bm.SetResolution(dpiX, dpiY);
    Graphics g = Graphics.FromImage(bm);
    g.InterpolationMode = InterpolationMode.Bicubic;
    g.PixelOffsetMode = PixelOffsetMode.HighQuality;
    g.DrawImage(image, 0, 0);
    g.Dispose();

    return bm;
}

0人赞添加讨论(0) 举报

The Tesseract OCR engine isn't able to read th

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间