I'm setting language_model_penalty_non_dict_word
through a config file for Tesseract 3.01, but its value doesn't have any effect. I've tried with multiple images, and multiple values for it, but the output for each image is always the same. Another user has noticed the same in a comment in another question.
Edit: After looking inside the source, the variable language_model_penalty_non_dict_word
is used only inside the function float LanguageModel::ComputeAdjustedPathCost
.
However, this function is never called! It is referenced only by 2 functions - LanguageModel::UpdateBestChoice()
and LanguageModel::AddViterbiStateEntry()
. I placed breakpoints in those functions, but they weren't being called, as well.
After some debugging, I finally found out the reason - the function
Wordrec::SegSearch()
wasn't being called (and it is up there in the call graph ofLanguageModel::ComputeAdjustedPathCost()
).From this code:
So you need to set
enable_new_segsearch
in the config file: