EDIT: mftraining gives the warning in the title for all the characters in the unicharset (so not just F, but a, b, c, d, etc also) How do I create these protos/configs?
I'm following this tutorial
Previous question that is now solved:-
Error:Assert failedWarning:in file ....\classify\trainingsampleset.cpp, line 622 no protos/ Segmentation Fault
This is the entire command + output:-
C:\training>mftraining -F font_properties -U unicharset -O eng.unicharset eng.impact.box.tr Warning: No shape table file present: shapetable Reading eng.impact.box.tr ... Font id = -1/0, class id = 1/103 on sample 0 font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file....\classify\trainingsampleset.cpp, line 622
I've looked through everything I could find on this warning in the title for all the characters in the unicharset (which wasn't much as it is)so not just F, but a, b, c, d, etc also) How do I can't figure out what the problem is and what would make it work. create these protos/configs?
I also tried the shapeclustering command, but that gives me the same error. Also, when I run these on cygwin, it displays Segmentation Fault instead of the assertion error.
I have found two possible causes of this problem.
Possible cause 1: incorrect font_properties
The font_properties file should contain the content described at:
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.00%E2%80%933.02#font_properties-new-in-301
and the file encoding should met the requirements of:
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.00%E2%80%933.02#requirements-for-text-input-files
This is the most common answer on the Internet.
(Also make sure you specify the font in font_properties and not the language.)
Possible cause 2: wrong training file name
However I found that trying to fix font_properties didn't work for me, and discovered another cause that gave the same error in my case.
The file .tr files must contain the following format:
and not:
(as is seen in some tutorials)
So in my case, this will NOT work:
whereas this small change does work:
In my case the font name in the font_properties file was uppercase, where the font name in the .tr file was lowercase. Changing them to the same case solved the problem.
I had the same issue, and changing
font_properties
as following fixed it:from -
batangche 1 0 0 0 0
to -
batangche.exp0 1 0 0 0 0
I was having the same problem, and it was indeed a problem with font_properties. However, in my case, it was solved by making sure that the font in font_properties matched exactly the font name in the .tr file. In my case, that was [fontname].exp0.
You misses a shapeclustering step, which is new in Tesseract 3.02 training.
I have the same problem with you. And It's because the font_properties is not formatted right.
Each line of the font_properties file is formatted as follows: fontname italic bold fixed serif fraktur
here only the fontname is needed. when I changed the file from lang.fontname.exp0 0 0 0 0 0 to fontname 0 0 0 0 0, my problem fixed