I need to extract table cells as images. The cells may contain mixed content (Text + Image), which I need to merge into a single image. I am able to get the core text but I have no idea to get an image+text. Not sure if Apace POI would help.
Has anyone done something like this earlier?
public static void readTablesDataInDocx(XWPFDocument doc) {
int tableIdx = 1;
int rowIdx = 1;
int colIdx = 1;
List table = doc.getTables();
System.out.println("==========No Of Tables in Document=============================================" + table.size());
for (int k = 0; k < table.size(); k++) {
XWPFTable xwpfTable = (XWPFTable) table.get(k);
System.out.println("================table -" + tableIdx + "===Data==");
rowIdx = 1;
List row = xwpfTable.getRows();
for (int j = 0; j < row.size(); j++) {
XWPFTableRow xwpfTableRow = (XWPFTableRow) row.get(j);
System.out.println("Row -" + rowIdx);
colIdx = 1;
List cell = xwpfTableRow.getTableCells();
for (int i = 0; i < cell.size(); i++) {
XWPFTableCell xwpfTableCell = (XWPFTableCell) cell.get(i);
if (xwpfTableCell != null) {
System.out.print("\t" + colIdx + "- column value: " + xwpfTableCell.getText());
}
colIdx++;
}
System.out.println("");
rowIdx++;
}
tableIdx++;
System.out.println("");
}
}
Now I am able to get Text with the help of this method
System.out.print("\t" + colIdx + "- column value: " + xwpfTableCell.getText());
How do I get the Image if a cell also contains one?
When you have a Cell, you can get hold of the paragraphs that form that Cell. These paragraphs are in turn formed by Runs, which you can obtain by calling the
getRuns
method. Runs themselves can contain embedded images, which you can obtain by calling thegetEmbeddedPictures
method.You can therefore have a method that gets the embedded pictures of a cell:
You should be able to discover more things about the actual pictures with the Picture documentation, and change the method to actually get the image data, name, etc.
Try this code, it's working for me