可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Before mark as copy or repeat question, please read the whole question first.
I am able to do at pressent is as below:
- To get image and crop the desired part for OCR.
- Process the image using
tesseract
and leptonica
.
- When the applied document is cropped in chunks ie 1 character per image it provides 96% of accuracy.
- If I don't do that and the document background is in white color and text is in black color it gives almost same accuracy.
For example if the input is as this photo :
Photo start
Photo end
What I want is to able to get the same accuracy for this photo
without generating blocks.
The code I used to init tesseract and extract text from image is as below:
For init of tesseract
in .h file
tesseract::TessBaseAPI *tesseract;
uint32_t *pixels;
in .m file
tesseract = new tesseract::TessBaseAPI();
tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
tesseract->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
tesseract->SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ");
tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "1");
tesseract->SetVariable("language_model_penalty_non_dict_word ", "1");
tesseract->SetVariable("tessedit_flip_0O", "1");
tesseract->SetVariable("tessedit_single_match", "0");
tesseract->SetVariable("textord_noise_normratio", "5");
tesseract->SetVariable("matcher_avg_noise_size", "22");
tesseract->SetVariable("image_default_resolution", "450");
tesseract->SetVariable("editor_image_text_color", "40");
tesseract->SetVariable("textord_projection_scale", "0.25");
tesseract->SetVariable("tessedit_minimal_rejection", "1");
tesseract->SetVariable("tessedit_zero_kelvin_rejection", "1");
For get text from image
- (void)processOcrAt:(UIImage *)image
{
[self setTesseractImage:image];
tesseract->Recognize(NULL);
char* utf8Text = tesseract->GetUTF8Text();
int conf = tesseract->MeanTextConf();
NSArray *arr = [[NSArray alloc]initWithObjects:[NSString stringWithUTF8String:utf8Text],[NSString stringWithFormat:@"%d%@",conf,@"%"], nil];
[self performSelectorOnMainThread:@selector(ocrProcessingFinished:)
withObject:arr
waitUntilDone:YES];
free(utf8Text);
}
- (void)ocrProcessingFinished0:(NSArray *)result
{
UIAlertView *alt = [[UIAlertView alloc]initWithTitle:@"Data" message:[result objectAtIndex:0] delegate:self cancelButtonTitle:nil otherButtonTitles:@"OK", nil];
[alt show];
}
But I don't get proper output for the number plate image either it is null or it gives some garbage data for the image.
And if I use the image which is the first one ie white background with text as black then the output is 89 to 95% accurate.
Please help me out.
Any suggestion will be appreciated.
Update
Thanks to @jcesar for providing the link and also to @konstantin pribluda to provide valuable information and guide.
I am able to convert images in to proper black and white form (almost). and so the recognition is better for all images :)
Need help with proper binarization of images. Any Idea will be appreciated
回答1:
Hi all Thanks for your replies, from all of that replies I am able to get this conclusion as below:
- I need to get the only one cropped image block with number plate contained in it.
- From that plate need to find out the portion of the number portion using the data I got using the method provided here.
- Then converting the image data to almost black and white using the RGB data found through the above method.
- Then the data is converted to the Image using the method provided here.
Above 4 steps are combined in to one method like this as below :
-(void)getRGBAsFromImage:(UIImage*)image
{
NSInteger count = (image.size.width * image.size.height);
// First get the image into your data buffer
CGImageRef imageRef = [image CGImage];
NSUInteger width = CGImageGetWidth(imageRef);
NSUInteger height = CGImageGetHeight(imageRef);
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
unsigned char *rawData = (unsigned char*) calloc(height * width * 4, sizeof(unsigned char));
NSUInteger bytesPerPixel = 4;
NSUInteger bytesPerRow = bytesPerPixel * width;
NSUInteger bitsPerComponent = 8;
CGContextRef context = CGBitmapContextCreate(rawData, width, height,
bitsPerComponent, bytesPerRow, colorSpace,
kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
CGColorSpaceRelease(colorSpace);
CGContextDrawImage(context, CGRectMake(0, 0, width, height), imageRef);
CGContextRelease(context);
// Now your rawData contains the image data in the RGBA8888 pixel format.
int byteIndex = 0;
for (int ii = 0 ; ii < count ; ++ii)
{
CGFloat red = (rawData[byteIndex] * 1.0) ;
CGFloat green = (rawData[byteIndex + 1] * 1.0) ;
CGFloat blue = (rawData[byteIndex + 2] * 1.0) ;
CGFloat alpha = (rawData[byteIndex + 3] * 1.0) ;
NSLog(@"red %f \t green %f \t blue %f \t alpha %f rawData [%d] %d",red,green,blue,alpha,ii,rawData[ii]);
if(red > Required_Value_of_red || green > Required_Value_of_green || blue > Required_Value_of_blue)//all values are between 0 to 255
{
red = 255.0;
green = 255.0;
blue = 255.0;
alpha = 255.0;
// all value set to 255 to get white background.
}
rawData[byteIndex] = red;
rawData[byteIndex + 1] = green;
rawData[byteIndex + 2] = blue;
rawData[byteIndex + 3] = alpha;
byteIndex += 4;
}
colorSpace = CGColorSpaceCreateDeviceRGB();
CGContextRef bitmapContext = CGBitmapContextCreate(
rawData,
width,
height,
8, // bitsPerComponent
4*width, // bytesPerRow
colorSpace,
kCGImageAlphaNoneSkipLast);
CFRelease(colorSpace);
CGImageRef cgImage = CGBitmapContextCreateImage(bitmapContext);
UIImage *img = [UIImage imageWithCGImage:cgImage];
//use the img for further use of ocr
free(rawData);
}
Note:
The only drawback of this method is the time consumed and the RGB value to convert to white and other to black.
UPDATE :
CGImageRef imageRef = [plate CGImage];
CIContext *context = [CIContext contextWithOptions:nil]; // 1
CIImage *ciImage = [CIImage imageWithCGImage:imageRef]; // 2
CIFilter *filter = [CIFilter filterWithName:@"CIColorMonochrome" keysAndValues:@"inputImage", ciImage, @"inputColor", [CIColor colorWithRed:1.f green:1.f blue:1.f alpha:1.0f], @"inputIntensity", [NSNumber numberWithFloat:1.f], nil]; // 3
CIImage *ciResult = [filter valueForKey:kCIOutputImageKey]; // 4
CGImageRef cgImage = [context createCGImage:ciResult fromRect:[ciResult extent]];
UIImage *img = [UIImage imageWithCGImage:cgImage];
Just replace the above method's(getRGBAsFromImage:
) code with this one and the result is same but the time taken is just 0.1 to 0.3 second only.
回答2:
I was able to achieve near instant results using the demo photo provided as well as it generating the correct letters.
I pre-processed the image using GPUImage
// Pre-processing for OCR
GPUImageLuminanceThresholdFilter * adaptiveThreshold = [[GPUImageLuminanceThresholdFilter alloc] init];
[adaptiveThreshold setThreshold:0.3f];
[self setProcessedImage:[adaptiveThreshold imageByFilteringImage:_image]];
And then sending that processed image to TESS
- (NSArray *)processOcrAt:(UIImage *)image {
[self setTesseractImage:image];
_tesseract->Recognize(NULL);
char* utf8Text = _tesseract->GetUTF8Text();
return [self ocrProcessingFinished:[NSString stringWithUTF8String:utf8Text]];
}
- (NSArray *)ocrProcessingFinished:(NSString *)result {
// Strip extra characters, whitespace/newlines
NSString * results_noNewLine = [result stringByReplacingOccurrencesOfString:@"\n" withString:@""];
NSArray * results_noWhitespace = [results_noNewLine componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
NSString * results_final = [results_noWhitespace componentsJoinedByString:@""];
results_final = [results_final lowercaseString];
// Separate out individual letters
NSMutableArray * letters = [[NSMutableArray alloc] initWithCapacity:results_final.length];
for (int i = 0; i < [results_final length]; i++) {
NSString * newTile = [results_final substringWithRange:NSMakeRange(i, 1)];
[letters addObject:newTile];
}
return [NSArray arrayWithArray:letters];
}
- (void)setTesseractImage:(UIImage *)image {
free(_pixels);
CGSize size = [image size];
int width = size.width;
int height = size.height;
if (width <= 0 || height <= 0)
return;
// the pixels will be painted to this array
_pixels = (uint32_t *) malloc(width * height * sizeof(uint32_t));
// clear the pixels so any transparency is preserved
memset(_pixels, 0, width * height * sizeof(uint32_t));
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
// create a context with RGBA pixels
CGContextRef context = CGBitmapContextCreate(_pixels, width, height, 8, width * sizeof(uint32_t), colorSpace,
kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedLast);
// paint the bitmap to our context which will fill in the pixels array
CGContextDrawImage(context, CGRectMake(0, 0, width, height), [image CGImage]);
_tesseract->SetImage((const unsigned char *) _pixels, width, height, sizeof(uint32_t), width * sizeof(uint32_t));
}
This left ' marks for the - but these are also easy to remove. Depending on the image set that you have you may have to fine tune it a bit but it should get you moving in the right direction.
Let me know if you have problems using it, it's from a project I'm using and I didn't want to have to strip everything out or create a project from scratch for it.
回答3:
I daresay that tesseract will be overkill for your purpose. You do not need dictionary matching to improve recognition quality ( you do not have this dictionary , but maybe means to compute checksum on license number ), and you have font optimised for OCR.
And best of all, you have markers (orange and blue color areas nearby are good) to find region in the image.
I my OCR apps I use human assisted area of interest retrieval ( just aiming help overlay over camera preview). Usually ones uses something like haar cascade to locate interesting features like faces. You may also calculate centroid of orange area, or just bounding box of orange pixels simply by traversing all the image and stoing leftmost / rightmost / topmost / bottommost pixels of suitable color
As for recognition itselff I would recommend to use invariant moments ( not sure whether implemented in tesseract, but you can easily port it from out java project: http://sourceforge.net/projects/javaocr/ )
I tried my demo app on monitor image and it recognized digits on the sport (is not trained
for characters)
As for binarisation ( separating black from white ) I would recommend sauvola method as this gives best tolerance to luminance changes ( also implemented in our OCR project )