Can I use OCR to detect font style (bold, italic)?

2019-03-11 11:57发布

I am interested in using OCR to extract bold and italic words from a simple text. For example, if I input a clear image with text like so:

"The quick brown fox jumps over the lazy dog."

I would like to get an output like so: bold("brown", "jumps"), italic("lazy")

I have looked into doing this with OCRopus or Tesseract, but the documentation is poor and I can't tell if it's possible, or how to do it if it is.

标签： ocr font-face tesseract

2条回答

forever°为你锁心

2楼-- · 2019-03-11 12:23

There is such function in Tesseract 3.0.1, from trunk. A new class is added to the API - ResultIterator, which has the following function you are interested in:

 WordFontAttributes(bool* is_bold,
                    bool* is_italic,
                    bool* is_underlined,
                    bool* is_monospace,
                    bool* is_serif,
                    bool* is_smallcaps,
                    int* pointsize,
                    int* font_id).

Actually you can see it yourself from here.

0人赞添加讨论(0) 举报

祖国的老花朵

3楼-- · 2019-03-11 12:24

The Tesseract 3.0x's XML-based hOCR format includes character attributes. You may want to try that.

http://code.google.com/p/tesseract-ocr/issues/detail?id=377#c5

0人赞添加讨论(0) 举报

Can I use OCR to detect font style (bold, italic)?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间