How to read Words from Identity Card using Tessera

2019-03-27 09:21发布

I am working information reading from Identity Card information using Tesseract Library.I got Confidence score of each word or each line.

Image Link

Box[0]: x=13, y=12, w=1134, h=57, confidence: 40, text: REPUYBLIQUE FRANCAISE

Box[1]: x=21, y=75, w=1119, h=50, confidence: 42, text:    7  NN99 3W F 59W

Box[2]: x=17, y=137, w=539, h=52, confidence: 30, text:   V7 7  D5 NOM1BOHEL

Box[3]: x=6, y=189, w=954, h=46, confidence: 0, text: 
Box[4]: x=12, y=239, w=1016, h=34, confidence: 40, text:      5   Q  HV2 H CHRISTIANL NICBLE  HBNIOIJE

Box[5]: x=21, y=310, w=975, h=53, confidence: 67, text:   2 E    20 06 1329

Box[6]: x=28, y=372, w=1043, h=83, confidence: 0, text: 
Box[7]: x=11, y=397, w=1147, h=67, confidence: 0, text: 
Box[8]: x=251, y=461, w=837, h=46, confidence: 0, text: 
Box[9]: x=157, y=475, w=1019, h=105, confidence: 0, text: 
Box[10]: x=59, y=648, w=1045, h=32, confidence: 81, text: IDFRADOUEL<<<<<<<<<<<<<<<<<<<<932013

Box[11]: x=57, y=722, w=1047, h=34, confidence: 76, text: 0506932020438CHRISTIANE<<NI2906209F3

Here is code used.

Pix *image = pixRead("/usr/src/tesseract-3.02/phototest.tif");
  tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
  api->Init(NULL, "eng");
  api->SetImage(image);
  Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true, NULL, NULL);
  printf("Found %d textline image components.\n", boxes->n);
  for (int i = 0; i < boxes->n; i++) {
    BOX* box = boxaGetBox(boxes, i, L_CLONE);
    api->SetRectangle(box->x, box->y, box->w, box->h);
    char* ocrResult = api->GetUTF8Text();
    int conf = api->MeanTextConf();
    fprintf(stdout, "Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: %s",
                    i, box->x, box->y, box->w, box->h, conf, ocrResult);
  }

Now i need to read all the words from Identity card.But i set the value tesseract::RIL_TEXTLINE as tesseract::RIL_WORD and ran the code. I got high confidence value even words there not in image.

image link

1.Is confidence score used to read information from Identity card.?

1.What is actually confidence score returned from tesseract OCR.?

1条回答
做个烂人
2楼-- · 2019-03-27 10:08

Try to add the french language in the Init call.

查看更多
登录 后发表回答