Can not use ChoiceIterator in tesseract

2019-04-15 22:16发布

问题:

First of all i want to confirm that i understand choice iterator right.

For example if i have a word on an image "scope",

choice iterator must give me something like "s" and maybe after Next(), "5".

for 3. letter "o" it maybe gives me "0", after Next() "O" and after Next() "o".

Do i understand right?

Here is all my related code,

api.SetImage((uchar*)img->imageData,img->width,img->height,img->depth/8,img->widthStep);

api.SetRectangle(0,0,img->width, img->height);
int left,top,right,bottom;
left=0;top=0;right=0;bottom=0;
api.Recognize(NULL);
tesseract::ResultIterator *ri=api.GetIterator();
tesseract::ChoiceIterator *choiceItr;
const tesseract::ResultIterator itr = *ri;
choiceItr = new tesseract::ChoiceIterator(itr);
const char * out=choiceItr->GetUTF8Text();
char * out2=(*ri).GetUTF8Text(tesseract::RIL_SYMBOL);
printf("out:%s,out2:%s",out,out2);

output on console is:

out:(null),out2:P

p is expected result iterator result, but choice iterator output is null.

thanks for ideas.

aproximately solved:

// This ensures Tesseract's "blob_choices" structures are filled
    SetVariable("save_best_choices", "T");

http://code.google.com/p/tesseract-ocr/issues/detail?id=555

回答1:

In case you haven't found a solution yet, the following code shows how to iterate over all characters (using ResultIterator) and its best alternatives (using ChoiceIterator).

tess.SetVariable("save_best_choices", "T"); 
tess.SetImage(...); 
tess.Recognize(0); 

tesseract::ResultIterator* ri = tess.GetIterator();
tesseract::ChoiceIterator* ci; 

if(ri != 0)
{
    do
    {
        const char* symbol = ri->GetUTF8Text(tesseract::RIL_SYMBOL);

        if(symbol != 0)
        {
            float conf = ri->Confidence(tesseract::RIL_SYMBOL); 
            std::cout << "\tnext symbol: " << symbol << "\tconf: " << conf << "\n"; 

            const tesseract::ResultIterator itr = *ri; 
            ci = new tesseract::ChoiceIterator(itr);

            do
            {
                const char* choice = ci->GetUTF8Text(); 
                std::cout << "\t\t" << choice << " conf: " << ci->Confidence() << "\n"; 
            }
            while(ci->Next()); 

            delete ci; 
        }

        delete[] symbol;
    }
    while((ri->Next(tesseract::RIL_SYMBOL)));
}


标签: ocr tesseract