I'm trying to use MODI to OCR a window's program. It works fine for screenshots I grab programmatically using win32 interop like this:
public string SaveScreenShotToFile()
{
RECT rc;
GetWindowRect(_hWnd, out rc);
int width = rc.right - rc.left;
int height = rc.bottom - rc.top;
Bitmap bmp = new Bitmap(width, height);
Graphics gfxBmp = Graphics.FromImage(bmp);
IntPtr hdcBitmap = gfxBmp.GetHdc();
PrintWindow(_hWnd, hdcBitmap, 0);
gfxBmp.ReleaseHdc(hdcBitmap);
gfxBmp.Dispose();
string fileName = @"c:\temp\screenshots\" + Guid.NewGuid().ToString() + ".bmp";
bmp.Save(fileName);
return fileName;
}
This image is then saved to a file and ran through MODI like this:
private string GetTextFromImage(string fileName)
{
MODI.Document doc = new MODI.DocumentClass();
doc.Create(fileName);
doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
MODI.Image img = (MODI.Image)doc.Images[0];
MODI.Layout layout = img.Layout;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < layout.Words.Count; i++)
{
MODI.Word word = (MODI.Word)layout.Words[i];
sb.Append(word.Text);
sb.Append(" ");
}
if (sb.Length > 1)
sb.Length--;
return sb.ToString();
}
This part works fine, however, I don't want to OCR the entire screenshot, just portions of it. I try cropping the image programmatically like this:
private string SaveToCroppedImage(Bitmap original)
{
Bitmap result = original.Clone(new Rectangle(0, 0, 250, 250), original.PixelFormat);
var fileName = "c:\\" + Guid.NewGuid().ToString() + ".bmp";
result.Save(fileName, original.RawFormat);
return fileName;
}
and then OCRing this smaller image, however MODI throws an exception; 'OCR running error', the error code is -959967087.
Why can MODI handle the original bitmap but not the smaller version taken from it?
yes the posts in this thread helped me gettin it to work, here what i have to add:
was trying to download images ( small ones ) then ocr...
-when processing images, it seems that theyr size must be power of 2 ! ( was able to ocr images: 512x512 , 128x128, 256x64 .. other sizes mostly failed ( like 1103x334 ))
transparent background also made troubles. I got the best results when creating a new tif with powerof2 boundary, white background, paste the downloaded image into it, save.
scaling the image did not succeed for me, since OCR is getting wrong results , specially for "german" characters like "ü"
in the end i also used: doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);
using modi from office 2003
greetings
womd