Apache POI: Retrieve page number from XWPFParagrap

2019-09-17 05:21发布

问题:

I am iterating over XWPFParagraph instances coming from an XWPTDocument instance (using the "getParagraphs()" method) Is there a way to retrieve the page numbers where each paragraph is located from the XWPFParagraph instances?

回答1:

To eventually turn Gagravarr's comment into a proper answer: No, this is not possible.

Doing so would require a full-blown Word rendering engine (i.e. MS Word itself) and even then you cannot be absolutely sure that page breaks will always occur at exactly those positions where they happened to be when the file was once created (think: missing fonts, missing pictures, different display options for vanished text and/or revision marking, different printer margins, etc.).

So claiming that some content in a Word file is on a certain line X on a certain page Y actually expresses a fundamental misunderstanding of the Word file format. There is simply no notion of line and page in there. It's all about runs resp. ranges.

In other words: Only upon opening such a file with MS Word will those contents be rendered onto individual lines / pages. And the behavior of this renderer unpredictable to a certain extent.