Extracting MS Word Table Cell as image?

2019-06-20 01:55发布

问题:

I need to extract table cells as images. The cells may contain mixed content (Text + Image), which I need to merge into a single image. I am able to get the core text but I have no idea to get an image+text. Not sure if Apace POI would help.

Has anyone done something like this earlier?

  public static void readTablesDataInDocx(XWPFDocument doc) {
    int tableIdx = 1;
    int rowIdx = 1;
    int colIdx = 1;
    List table = doc.getTables();
    System.out.println("==========No Of Tables in Document=============================================" + table.size());
    for (int k = 0; k < table.size(); k++) {
        XWPFTable xwpfTable = (XWPFTable) table.get(k);
        System.out.println("================table -" + tableIdx + "===Data==");
        rowIdx = 1;
        List row = xwpfTable.getRows();
        for (int j = 0; j < row.size(); j++) {
            XWPFTableRow xwpfTableRow = (XWPFTableRow) row.get(j);
            System.out.println("Row -" + rowIdx);
            colIdx = 1;
            List cell = xwpfTableRow.getTableCells();
            for (int i = 0; i < cell.size(); i++) {
                XWPFTableCell xwpfTableCell = (XWPFTableCell) cell.get(i);
                if (xwpfTableCell != null) {
                    System.out.print("\t" + colIdx + "- column value: " + xwpfTableCell.getText());
                }
                colIdx++;
            }
            System.out.println("");
            rowIdx++;
        }
        tableIdx++;
        System.out.println("");
    }
}

Now I am able to get Text with the help of this method

System.out.print("\t" + colIdx + "- column value: " + xwpfTableCell.getText());

How do I get the Image if a cell also contains one?

回答1:

Try this code, it's working for me

 XWPFDocument doc = new XWPFDocument(new FileInputStream(fileName));
            List<XWPFTable> table = doc.getTables();
            for (XWPFTable xwpfTable : table) {
                List<XWPFTableRow> row = xwpfTable.getRows();
                for (XWPFTableRow xwpfTableRow : row) {
                    List<XWPFTableCell> cell = xwpfTableRow.getTableCells();
                    for (XWPFTableCell xwpfTableCell : cell) {
                        if (xwpfTableCell != null) {
                            System.out.println(xwpfTableCell.getText());
                            String s = xwpfTableCell.getText();
                            for (XWPFParagraph p : xwpfTableCell.getParagraphs()) {
                                for (XWPFRun run : p.getRuns()) {
                                    for (XWPFPicture pic : run.getEmbeddedPictures()) {
                                        byte[] pictureData = pic.getPictureData().getData();
                                        System.out.println("picture : " + pictureData);
                                    }
                                }
                            }
                        }
                    }
                }
            }


回答2:

When you have a Cell, you can get hold of the paragraphs that form that Cell. These paragraphs are in turn formed by Runs, which you can obtain by calling the getRuns method. Runs themselves can contain embedded images, which you can obtain by calling the getEmbeddedPictures method.

You can therefore have a method that gets the embedded pictures of a cell:

public static void printDescriptionOfImagesInCell(XWPFTableCell cell) {
    List<XWPFParagraph> paragrahs = cell.getParagraphs();
    for (XWPFParagraph paragraph : paragraphs) {
        List<XWPFRun> runs = paragraph.getRuns();
        for (XWPFRun run : runs) {
            List<XWPFPicture> pictures = run.getEmbeddedPictures();
            for (XWPFPicture picture : pictures) {
                //Do anything you want with the picture:
                System.out.println("Picture: " + picture.getDescription());
            }
        }
    }
}

You should be able to discover more things about the actual pictures with the Picture documentation, and change the method to actually get the image data, name, etc.