How to read DOCX using Apache POI in page by page

2019-08-27 06:19发布

问题:

I would like to read a docx files to search for a particular text. I would like the program to print the page on which it was found and the document name. I have written this simple method, but it doesn't count any page:

     private static void searchDocx(File file, String searchText) throws IOException {
        FileInputStream fis = new FileInputStream(file.getAbsolutePath());
        XWPFDocument document = new XWPFDocument(fis);

        int pageNo = 1;
        for (XWPFParagraph paragraph : document.getParagraphs()) {

            String text = paragraph.getText();
            if (text != null) {
                if (text.toLowerCase().contains(searchText.toLowerCase())) {
                    System.out.println("found on page: " + pageNo+ " in: " + file.getAbsolutePath());
                }
            }
            if (paragraph.isPageBreak()) {
                pageNo++;
            }
        }
    }

How to read the file, to be able to print the information on which page the searchText was found? Is there any way to know the page when reading the docx using ApachePOI?