I am working information reading from Identity Card information using Tesseract Library.I got Confidence score of each word or each line.
Box[0]: x=13, y=12, w=1134, h=57, confidence: 40, text: REPUYBLIQUE FRANCAISE
Box[1]: x=21, y=75, w=1119, h=50, confidence: 42, text: 7 NN99 3W F 59W
Box[2]: x=17, y=137, w=539, h=52, confidence: 30, text: V7 7 D5 NOM1BOHEL
Box[3]: x=6, y=189, w=954, h=46, confidence: 0, text:
Box[4]: x=12, y=239, w=1016, h=34, confidence: 40, text: 5 Q HV2 H CHRISTIANL NICBLE HBNIOIJE
Box[5]: x=21, y=310, w=975, h=53, confidence: 67, text: 2 E 20 06 1329
Box[6]: x=28, y=372, w=1043, h=83, confidence: 0, text:
Box[7]: x=11, y=397, w=1147, h=67, confidence: 0, text:
Box[8]: x=251, y=461, w=837, h=46, confidence: 0, text:
Box[9]: x=157, y=475, w=1019, h=105, confidence: 0, text:
Box[10]: x=59, y=648, w=1045, h=32, confidence: 81, text: IDFRADOUEL<<<<<<<<<<<<<<<<<<<<932013
Box[11]: x=57, y=722, w=1047, h=34, confidence: 76, text: 0506932020438CHRISTIANE<<NI2906209F3
Here is code used.
Pix *image = pixRead("/usr/src/tesseract-3.02/phototest.tif");
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
api->Init(NULL, "eng");
api->SetImage(image);
Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true, NULL, NULL);
printf("Found %d textline image components.\n", boxes->n);
for (int i = 0; i < boxes->n; i++) {
BOX* box = boxaGetBox(boxes, i, L_CLONE);
api->SetRectangle(box->x, box->y, box->w, box->h);
char* ocrResult = api->GetUTF8Text();
int conf = api->MeanTextConf();
fprintf(stdout, "Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: %s",
i, box->x, box->y, box->w, box->h, conf, ocrResult);
}
Now i need to read all the words from Identity card.But i set the value tesseract::RIL_TEXTLINE as tesseract::RIL_WORD and ran the code. I got high confidence value even words there not in image.
1.Is confidence score used to read information from Identity card.?
1.What is actually confidence score returned from tesseract OCR.?
Try to add the french language in the Init call.