mftraining gives Warning: no protos/configs for F

2020-02-15 02:03发布

EDIT: mftraining gives the warning in the title for all the characters in the unicharset (so not just F, but a, b, c, d, etc also) How do I create these protos/configs?

I'm following this tutorial


Previous question that is now solved:-
Error:Assert failedWarning:in file ....\classify\trainingsampleset.cpp, line 622 no protos/ Segmentation Fault
This is the entire command + output:-

C:\training>mftraining -F font_properties -U unicharset -O eng.unicharset eng.impact.box.tr Warning: No shape table file present: shapetable Reading eng.impact.box.tr ... Font id = -1/0, class id = 1/103 on sample 0 font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file....\classify\trainingsampleset.cpp, line 622

I've looked through everything I could find on this warning in the title for all the characters in the unicharset (which wasn't much as it is)so not just F, but a, b, c, d, etc also) How do I can't figure out what the problem is and what would make it work. create these protos/configs?

I also tried the shapeclustering command, but that gives me the same error. Also, when I run these on cygwin, it displays Segmentation Fault instead of the assertion error.

7条回答
倾城 Initia
2楼-- · 2020-02-15 02:12

I have found two possible causes of this problem.

Possible cause 1: incorrect font_properties

The font_properties file should contain the content described at:

https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.00%E2%80%933.02#font_properties-new-in-301

and the file encoding should met the requirements of:

https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.00%E2%80%933.02#requirements-for-text-input-files

This is the most common answer on the Internet.

(Also make sure you specify the font in font_properties and not the language.)

Possible cause 2: wrong training file name

However I found that trying to fix font_properties didn't work for me, and discovered another cause that gave the same error in my case.

The file .tr files must contain the following format:

<language>.<fontname>.exp<num>.tr

and not:

<language>.<fontname>.exp<num>.box.tr

(as is seen in some tutorials)

So in my case, this will NOT work:

tesseract eng.unknown.exp1.png eng.unknown.exp1.box nobatch box.train
unicharset_extractor eng.unknown.exp1.box
mftraining -F font_properties -U unicharset -O eng.unicharset eng.unknown.exp1.box.tr

whereas this small change does work:

tesseract eng.unknown.exp1.png eng.unknown.exp1 nobatch box.train
unicharset_extractor eng.unknown.exp1.box
mftraining -F font_properties -U unicharset -O eng.unicharset eng.unknown.exp1.tr
查看更多
我欲成王,谁敢阻挡
3楼-- · 2020-02-15 02:19

In my case the font name in the font_properties file was uppercase, where the font name in the .tr file was lowercase. Changing them to the same case solved the problem.

查看更多
Juvenile、少年°
4楼-- · 2020-02-15 02:27

I had the same issue, and changing font_properties as following fixed it:

from - batangche 1 0 0 0 0

to - batangche.exp0 1 0 0 0 0

查看更多
狗以群分
5楼-- · 2020-02-15 02:31

I was having the same problem, and it was indeed a problem with font_properties. However, in my case, it was solved by making sure that the font in font_properties matched exactly the font name in the .tr file. In my case, that was [fontname].exp0.

查看更多
聊天终结者
6楼-- · 2020-02-15 02:33

You misses a shapeclustering step, which is new in Tesseract 3.02 training.

查看更多
够拽才男人
7楼-- · 2020-02-15 02:35

I have the same problem with you. And It's because the font_properties is not formatted right.

Each line of the font_properties file is formatted as follows: fontname italic bold fixed serif fraktur

here only the fontname is needed. when I changed the file from lang.fontname.exp0 0 0 0 0 0 to fontname 0 0 0 0 0, my problem fixed

查看更多
登录 后发表回答