可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a problem with running tesseract-ocr engine on linux. I've downloaded RUS language data and put it to tessdata directory (/usr/local/share/tessdata). When I'm trying to run tesseract with command tesseract blob.jpg out -l rus
, it displays an error:
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language eng
Tesseract couldn't load any languages!
Could not initialize tesseract.
According to compiling guide, I used export TESSDATA_PREFIX='/usr/local/share/'
to point my tessdata directory.
Maybe I should edit any config files? Tesseract try to load 'eng' data files instead of 'rus'.
Screenshot:
http://i.stack.imgur.com/I0Guc.png
回答1:
You can grab eng.traineddata
from Google (compressed):
wget https://tesseract-ocr.googlecode.com/files/eng.traineddata.gz
or Github (raw):
wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
Check https://github.com/tesseract-ocr/tessdata for a full list of trained language data.
When you grab the file(s), move them to the /usr/local/share/tessdata
folder. Warning: some Linux distributions (such as openSUSE and Ubuntu) may be expecting it in /usr/share/tessdata
instead.
# If you got the data from Google, unzip it first!
gunzip eng.traineddata.gz
# Move the data
sudo mv -v eng.traineddata /usr/local/share/tessdata/
回答2:
The simpliest way is to install the needed package:
sudo apt-get install tesseract-ocr-eng #for english
sudo apt-get install tesseract-ocr-tam #for tamil
sudo apt-get install tesseract-ocr-deu #for deutsch (German)
As you can notice, it opens the road to others languages (i.e. tesseract-ocr-fra).
回答3:
I had this error too on the Windows machine.
My solution.
1) Download your language files from
https://github.com/tesseract-ocr/tessdata/tree/3.04.00
For example, for eng, I downloaded all files with eng prefix.
2) Put them into tessdata directory inside of some folder. Add this folder into System Path variables as TESSDATA_PREFIX.
Result will be
System env var: TESSDATA_PREFIX=D:/Java/OCR
And OCR folder has tessdata with languages files.
This is a screenshot of the directory:
回答4:
No previous solution worked for me.
I've installed both by apt-get
and manually downloading the tessdata, moved around /usr
and so on and no one worked even if i exported the variable thousand times.
Finally, on a last try before start to cry i've tried to pass the path directly to the instance of Tesseract().
In Python: tr = Tesseract("/usr/local/share/tesseract-ocr/")
and now it works. To clarify, im using tesserwrap
module.
回答5:
You can call tesseract API function from C code:
#include <tesseract/baseapi.h>
#include <tesseract/ocrclass.h>; // ETEXT_DESC
using namespace tesseract;
class TessAPI : public TessBaseAPI {
public:
void PrintRects(int len);
};
...
TessAPI *api = new TessAPI();
int res = api->Init(NULL, "rus");
api->SetAccuracyVSpeed(AVS_MOST_ACCURATE);
api->SetImage(data, w0, h0, bpp, stride);
api->SetRectangle(x0,y0,w0,h0);
char *text;
ETEXT_DESC monitor;
api->RecognizeForChopTest(&monitor);
text = api->GetUTF8Text();
printf("text: %s\n", text);
printf("m.count: %s\n", monitor.count);
printf("m.progress: %s\n", monitor.progress);
api->RecognizeForChopTest(&monitor);
text = api->GetUTF8Text();
printf("text: %s\n", text);
...
api->End();
And build this code:
g++ -g -I. -I/usr/local/include -o _test test.cpp -ltesseract_api -lfreeimageplus
(i need FreeImage for picture loading)
回答6:
I'm using Visual Studio 2017 Community Edition.
I solved this problem by making a directory called tessdata in the Debug directory of my project. Then I put the eng.traineddata file into said directory.
回答7:
tesseract --tessdata-dir <tessdata-folder> <image-path> stdout --oem 2 -l <lng>
In my case, the mistakes that I've made or attempts that wasn't a success.
- I cloned the github repo and copied files from there to
- /usr/local/share/tessdata/
- /usr/share/tesseract-ocr/tessdata/
- /usr/share/tessdata/
- Used
TESSDATA_PREFIX
with above paths
- sudo apt-get install tesseract-ocr-eng
First 2 attempts did not worked because, the files from git clone
did not worked for the reasons that I do not know. I am not sure why #3 attempt worked for me.
Finally,
- I downloaded the eng.traindata file using
wget
- Copied it to some folder
- Used
--tessdata-dir
with folder name
Take away for me is to learn the tool well & make use of it, rather than relying on package manager installation & folders