The Tesseract OCR engine isn't able to read th

2020-04-16 02:00发布

问题:

I'm using a .NET wrapper for the Tesseract OCR engine. I have a large document that is a PNG. When I cut out a section of image in MS paint and then feed it into the engine, it works. But when I do it in code, the engine can't recognize the text in the image. The images look the same and the properties don't appear very off. So I'm a little confused.

Here are the two images. From MS paint:

From code:

This is what I get from the MS paint image:

And through code:

They're really similar So I'm not sure why it can't recognize the second text. The following is how I'm generating the image.

public Bitmap CropImage(Bitmap source, Rectangle section)
    {
        Bitmap bmp = new Bitmap(section.Width, section.Height);
        Graphics g = Graphics.FromImage(bmp);
        g.DrawImage(source, 0, 0, section, GraphicsUnit.Pixel);

        return bmp;
    }

    private void Form1_Load(object sender, EventArgs e)
    {
        Bitmap source = new Bitmap(test);
        Rectangle section = new Rectangle(new Point(78, 65), new Size(800, 50));
        Bitmap CroppedImage = CropImage(source, section);
        CroppedImage.Save(@"c:\users\user\desktop\test34.png", System.Drawing.Imaging.ImageFormat.Png);

        this.pictureBox1.Image = CroppedImage;
    }

回答1:

The default resolution of a new Bitmap is 96 DPI, which is not adequate for OCR purpose. Try to increase to 300 DPI, such as:

bmp.SetResolution(300, 300);

Update 1: When you scale the image, its dimension should change as well. Here's an example rescale function:

public static Image Rescale(Image image, int dpiX, int dpiY)
{
    Bitmap bm = new Bitmap((int)(image.Width * dpiX / image.HorizontalResolution), (int)(image.Height * dpiY / image.VerticalResolution));
    bm.SetResolution(dpiX, dpiY);
    Graphics g = Graphics.FromImage(bm);
    g.InterpolationMode = InterpolationMode.Bicubic;
    g.PixelOffsetMode = PixelOffsetMode.HighQuality;
    g.DrawImage(image, 0, 0);
    g.Dispose();

    return bm;
}