可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Tesseract 3 is able to perform page layout analysis. However, I couldn't find any sample code or documentation on how to use the library for such purposes. I hope someone here can explain how to perform layout analysis on an image and how to parse the resulting data.

回答1:

Tesseract can be given a page mode parameter (-psm) which can have the following values:

0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.

Example:

tesseract image.tif image.txt -l eng -psm 0

However, I am not sure that it is possible to use the layout analysis in standalone mode.

回答2:

First, initialize TessBaseAPI instance. You can either use Init() (if you want to perform further text recognition) or InitForAnalysePage() (if you're interested just in text boxes).

Second, set the image using SetImage().

And finally, call AnalyseLayout() to get PageIterator which provides you with text boxes.

tesseract::TessBaseAPI tessApi;
tessApi.InitForAnalysePage();

// tessApi.SetImage(...);

tesseract::PageIterator *iter = tessApi.AnalyseLayout();

// Instead of RIL_WORD you can use any other PageSegMode
while (iter->Next(tesseract::RIL_WORD)) {
    int left, top, right, bottom;

    iter->BoundingBox(
            tesseract::RIL_WORD,
            &left, &top, &right, &bottom
    );
}

回答3:

Not sure if this exactly answers your question, but I landed here looking for ways to get the bbox-coordinates info (and text recognised inside the bbox optionally) given an input image. The solution to which is now possible using tesseract.

$> tesseract test.tiff test.txt -l eng -psm 1 tsv

The params to notice in above code-snippet are 'psm' and 'tsv'. 'psm' selects the page segmentation mode and 'tsv' generates a nice tabular output file with all the information (page-block-line number, bbox coods, confidence, predicted text) you'd need on your text-image (shown below)

level   page_num    block_num   par_num line_num    word_num    left    top width   height  conf    text
1   1   0   0   0   0   0   0   5500    4250    -1
2   1   1   0   0   0   327 285 2218    53  -1
3   1   1   1   0   0   327 285 2218    53  -1
4   1   1   1   1   0   327 285 2218    53  -1
5   1   1   1   1   1   327 285 246 38  87  INFOPAC
5   1   1   1   1   2   620 287 165 38  87  PAGE
5   1   1   1   1   3   952 290 100 37  95  NAME
5   1   1   1   1   4   1173    292 1082    45  39  ENTRYDATE
5   1   1   1   1   5   2333    302 212 36  48  EMAIL

回答4:

There is an option since 3.04:

tesseract -c preserve_interword_spaces=1 test.tif test

Here is a reference to what looks like the related development thread.

Page layout analysis using Tesseract?

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

Page layout analysis using Tesseract?

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮