I am trying to use Tesseract 3.02 with ctypes and cv2 in python. Tesseract provides a DLL exposed set of C style APIs, one of them is as following:
TESS_API void TESS_CALL TessBaseAPISetImage(TessBaseAPI* handle, const unsigned char* imagedata, int width, int height, int bytes_per_pixel, int bytes_per_line);
So far, my code is as follows:
tesseract = ctypes.cdll.LoadLibrary('libtesseract302.dll')
api = tesseract.TessBaseAPICreate()
tesseract.TessBaseAPIInit3(api, '', 'eng')
imcv = cv2.imread('test.bmp')
w, h, d = imcv.shape
ret = tesseract.TessBaseAPISetImage(api, ctypes.c_char_p(str(imcv.data)), w, h, d, w * d)
#ret = 44 here
The last line return an error code 44, which I can't find anywhere in errcode.h provided by Tesseract. I am not sure what I am doing wrong here.
I have found similar question How to recognize data not filename using ctypes and tesseract 3.0.2?, however the question is not resolved.
I am also aware of https://code.google.com/p/python-tesseract/, I dig into source code of this project but didn't be able to find the information I need.
I can confirm the image in test.bmp is legit and readable by calling cv2.imshow
.
also the same image can be OCR by Tesseract on command line.
The default restype
is c_int
, and the default argument conversion from an integer is also c_int
. You'll find examples on the web that assume a 32-bit platform that has sizeof(int) == sizeof(void *)
. This was never a good assumption to make. To protect a 64-bit pointer from truncation when converted to and from a Python integer, set the function pointer's argtypes
and restype
. It's a good idea to do this anyway, since it allows ctypes to raise an ArgumentError
when the the wrong types or number of arguments are used.
If you'd rather not define the prototypes for every function, then at least set TessBaseAPICreate.restype
to an opaque pointer type.
The following ctypes definitions are based on the header api/capi.h. For convenience I've packaged the API into a Tesseract
class.
import sys
import cv2
import ctypes
import ctypes.util
if sys.platform == 'win32':
LIBNAME = 'libtesseract302'
else:
LIBNAME = 'tesseract'
class TesseractError(Exception):
pass
class Tesseract(object):
_lib = None
_api = None
class TessBaseAPI(ctypes._Pointer):
_type_ = type('_TessBaseAPI', (ctypes.Structure,), {})
@classmethod
def setup_lib(cls, lib_path=None):
if cls._lib is not None:
return
if lib_path is None:
lib_path = ctypes.util.find_library(LIBNAME)
if lib_path is None:
raise TesseractError('tesseract library not found')
cls._lib = lib = ctypes.CDLL(lib_path)
# source:
# https://github.com/tesseract-ocr/tesseract/
# blob/3.02.02/api/capi.h
lib.TessBaseAPICreate.restype = cls.TessBaseAPI
lib.TessBaseAPIDelete.restype = None # void
lib.TessBaseAPIDelete.argtypes = (
cls.TessBaseAPI,) # handle
lib.TessBaseAPIInit3.argtypes = (
cls.TessBaseAPI, # handle
ctypes.c_char_p, # datapath
ctypes.c_char_p) # language
lib.TessBaseAPISetImage.restype = None
lib.TessBaseAPISetImage.argtypes = (
cls.TessBaseAPI, # handle
ctypes.c_void_p, # imagedata
ctypes.c_int, # width
ctypes.c_int, # height
ctypes.c_int, # bytes_per_pixel
ctypes.c_int) # bytes_per_line
lib.TessBaseAPIGetUTF8Text.restype = ctypes.c_char_p
lib.TessBaseAPIGetUTF8Text.argtypes = (
cls.TessBaseAPI,) # handle
def __init__(self, language='eng', datapath=None, lib_path=None):
if self._lib is None:
self.setup_lib(lib_path)
self._api = self._lib.TessBaseAPICreate()
if self._lib.TessBaseAPIInit3(self._api, datapath, language):
raise TesseractError('initialization failed')
def __del__(self):
if not self._lib or not self._api:
return
if not getattr(self, 'closed', False):
self._lib.TessBaseAPIDelete(self._api)
self.closed = True
def _check_setup(self):
if not self._lib:
raise TesseractError('lib not configured')
if not self._api:
raise TesseractError('api not created')
def set_image(self, imagedata, width, height,
bytes_per_pixel, bytes_per_line=None):
self._check_setup()
if bytes_per_line is None:
bytes_per_line = width * bytes_per_pixel
self._lib.TessBaseAPISetImage(self._api,
imagedata, width, height,
bytes_per_pixel, bytes_per_line)
def get_utf8_text(self):
self._check_setup()
return self._lib.TessBaseAPIGetUTF8Text(self._api)
def get_text(self):
self._check_setup()
result = self._lib.TessBaseAPIGetUTF8Text(self._api)
if result:
return result.decode('utf-8')
Example usage:
if __name__ == '__main__':
imcv = cv2.imread('ocrtest.png')
height, width, depth = imcv.shape
tess = Tesseract()
tess.set_image(imcv.ctypes, width, height, depth)
text = tess.get_text()
print text.strip()
I tested this on Linux with libtesseract.so.3. Note that cv2.imread
returns a NumPy array. This has a ctypes
attribute that includes the _as_parameter_
hook, set as a c_void_p
pointer to the the array. Note also that the code shown in the question has the width and height transposed. It should have been h, w, d = imcv.shape
.
ocrtest.png:
Output:
I am trying to use Tesseract 3.02 with ctypes and cv2 in python. Tesseract
provides a DLL exposed set of C style APIs, one of them is as following: