采用正方体OCR中国文字识别(chinese character recognition using

2019-09-03 14:23发布

我一直在使用的Tesseract 3.0.2 OCR SDK图像文本提取。但是，如果使用中国的文字图像并通过OCR然后正方体不提供我的中国字符，而不是说我得到的数字和英文字符。但我需要中国的字符显示我使用的形象。

我怎样才能做到这一点？有没有什么办法可以得到中国的字符，而不是任何其他字符？

Answer 1:

您需要下载中国训练有素的数据（它会像一个chi_sim.traineddata文件），并将其添加到您的tessdata文件夹。

要下载文件https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata

并使用这样的

Tesseract* tesseract= [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"chi_sim"];

如果你有任何问题，你可以下载我的tessaract实验（与中国语言支持） https://github.com/aryansbtloe/ExperimentWithTesseract.git

我测试了这一个...希望你会发现这个有用。

文章来源: chinese character recognition using Tesseract OCR

Answer 1: