Image processing for OCR with leptonica (inverse c

2019-02-01 01:53发布

I am trying to process the following image with leptonica to extract text with tesseract.

Original Image:

Tesseract on the original image yields this:

i s l
D2J1FiiE-l191x1iitmwii9 uhiaiislz-2 Q ~37
Bottom linez
With a little time!
you can learn social media technology
using free online resources-
And if you donity
youlll be at a significant disadvantage
to
other HOn-pFOiiTS-

Not great, especially the top background. So using leptionica I use a background removal algorithm (blur, difference, threshold, invert) to get the following image: processed image

But tesseract doesn't do a good job with it:

@@r-mair lkrm@W lh@w ilr@ mJs@ iklh@ ii@c2lhm1@ll
mm Mime
VWU1 a Mitt-Jle time-
@1m ll@@Wn Om @@@lh1
using free onhne resources-
Andifyoudoni
9110 ate a $0 D
to other non-profrts
I

The main problem, it seems, is that now all of the text is outlined instead of solid. How can I adjust my algorithm or what can I add to made the text solid?

标签： image-processing ocr tesseract

1条回答

虎瘦雄心在

2楼-- · 2019-02-01 02:39

It seems that this paper proposes a binarization method which solves your problem:

T Kasar, J Kumar and A G Ramakrishnan. Font and Background Color Independent Text Binarization. (2007)

Kasar etal method performance

0人赞添加讨论(0) 举报

Image processing for OCR with leptonica (inverse c

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间