How to get pictures with names from an xls file us

2019-08-09 22:35发布

问题:

Using workbook.getAllPictures() I can get an array of picture data but unfortunately it is only the data and those objects have no methods for accessing the name of the picture or any other related information.

There is a HSSFPicture class which would contain all the details of the picture but how to get for example an array of those objects from the xls?

Update:

Found SO question How can I find a cell, which contain a picture in apache poi which has a method for looping through all the pictures in the worksheet. That works.

Now that I was able to try the HSSFPicture class I found out that the getFileName() method is returning the file name without the extension. I can use the getPictureData().suggestFileExtension() to get a suggested file extension but I really would need to get the extension the picture had when it was added into the xls file. Would there be a way to get it?

Update 2:

The pictures are added into the xls with a macro. This is the part of macro that is adding the images into the sheet. fname is the full path and imageName is the file name, both are including the extension.

Set img = Sheets("Receipt images").Pictures.Insert(fname)
img.Left = 10
img.top = top + 10
img.Name = imageName
Set img = Nothing

The routine to check if the picture already exists in the Excel file.

For Each img In Sheets("Receipt images").Shapes
    If img.Name = imageName Then
        Set foundImage = img
        Exit For
    End If
Next

This recognizes that "image.jpg" is different from "image.gif", so the img.Name includes the extension.

回答1:

The shape names are not in the default POI objects. So if we need them we have to deal with the underlying objects. That is for the shapes in HSSF mainly the EscherAggregate (http://poi.apache.org/apidocs/org/apache/poi/hssf/record/EscherAggregate.html) which we can get from the sheet. From its parent class AbstractEscherHolderRecord we can get all EscherOptRecords which contains the options of the shapes. In those options are also to find the groupshape.shapenames.

My example is not the complete solution. It is only provided to show which objects could be used to achieve this.

Example:

import org.apache.poi.hssf.usermodel.*;
import org.apache.poi.ss.usermodel.*;

import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.FileInputStream;
import java.io.InputStream;

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;

import org.apache.poi.hssf.record.*;
import org.apache.poi.ddf.*;

import java.util.List;
import java.util.ArrayList;

class ShapeNameTestHSSF {

 public static void main(String[] args) {
  try {

   InputStream inp = new FileInputStream("workbook1.xls");
   Workbook wb = WorkbookFactory.create(inp);

   Sheet sheet = wb.getSheetAt(0);

   EscherAggregate escherAggregate = ((HSSFSheet)sheet).getDrawingEscherAggregate();

   EscherContainerRecord escherContainer = escherAggregate.getEscherContainer().getChildContainers().get(0); 
   //throws java.lang.NullPointerException if no Container present

   List<EscherRecord> escherOptRecords = new ArrayList<EscherRecord>();

   escherContainer.getRecordsById(EscherOptRecord.RECORD_ID, escherOptRecords);

   for (EscherRecord escherOptRecord : escherOptRecords) {
    for (EscherProperty escherProperty : ((EscherOptRecord)escherOptRecord).getEscherProperties()) {
     System.out.println(escherProperty.getName());
     if (escherProperty.isComplex()) {
      System.out.println(new String(((EscherComplexProperty)escherProperty).getComplexData(), "UTF-16LE"));
     } else {
      if (escherProperty.isBlipId()) System.out.print("BlipId = ImageId = ");
      System.out.println(((EscherSimpleProperty)escherProperty).getPropertyValue());
     }
     System.out.println("=============================");
    }
    System.out.println(":::::::::::::::::::::::::::::");
   }


   FileOutputStream fileOut = new FileOutputStream("workbook1.xls");
   wb.write(fileOut);
   fileOut.flush();
   fileOut.close();

  } catch (InvalidFormatException ifex) {
  } catch (FileNotFoundException fnfex) {
  } catch (IOException ioex) {
  }
 }
}

Again: This is not a ready to use solution. A ready to use solution cannot be provided here, because of the complexity of the EscherRecords. Maybe to get the correct EscherRecords for the image shapes and their related EscherOptRecords, you have recursive to loop through all EscherRecords in the EscherAggregate checking whether they are ContainerRecords and if so loop through its children and so on.



回答2:

Start here:

http://poi.apache.org/spreadsheet/quick-guide.html#Images

this tutorial can help you to extract an image's information from an xls spreadsheet using Apache POI