As far as I know, Tesseract 3.x comes with 6 English (correct me if I'm wrong) fonts. I need to train Tesseract for more 5 types of fonts. I need only capital letters and digits (no special characters or symbols).
I followed various processes for example: Adding New Fonts to Tesseract 3 OCR Engine
and also used tools to automate the process like Serak Tesseract Trainer for Tesseract 3.02
For generating box files I used QT Box Editor
After using above tools I get eng.traineddata
file. All tutorials tell me to add this eng.traineddata
file to the Tesseract-OCR\tessdata
folder, but doing so, it will replace the original eng.traineddata
file. After doing this will I lose the default fonts that come with Tesseract 3.x ?
How can I Add new fonts? Its still not clear to me. I hope someone can help me here. Thanks.
Should use a different name, e.g.,
eng1.traineddata
. That way you can use the new data with the original one by specifying the language option-l eng+eng1
.If you have new trained data with different font, I think you don't have dictionary correction for your new font.
To add new trained data you can do this (I'm using PHP code here)
By seeing the tesseract.php function
setLanguage()
, you can set the language by that function.