Get excel sheet names for large excel files using

2019-07-11 06:55发布

问题:

I have the following code that I use to get the sheet names of an excel file(.xlsx)

    XSSFWorkbook workBookXlsx = new XSSFWorkbook(new FileInputStream(pathToFile));
    ArrayList<String> sheetNames = new ArrayList<>();

    int numberOfSheets = workBookXlsx.getNumberOfSheets();
    for (int i = 0; i < numberOfSheets; i++) {
        sheetNames.add(workBookXlsx.getSheetAt(i).getSheetName());
    }

    workBookXlsx = null;

The issue I have with the above code is that it takes a lot of memory(~700MB) & a long time(5-6s) to create the XSSFWorkbook for a file of size 9MB. Even setting the workBookXlsx to null doesn't release the memory taken by the javaw(I know gc may or maynot be called & JVM wont release memory just because I have set a variable to null)

I did go through the documentation of Workbook, XSSFWorkbook & from what I understood, there is no method that will help me get the sheet names with low memory imprint.

The one solution I have found is to manually unzip the .xlsx file and read the contents of the .\xl\woorkbook.xml to get the sheet names and the r:id

Is there an API for getting the sheet names in an .xlsx file without large memory imprint?

回答1:

To show what @Gagravarr probably meant with his comment:

The XSSFReader contains a method XSSFReader.getSheetsData which "Returns an Iterator which will let you get at all the different Sheets in turn. Each sheet's InputStream is only opened when fetched from the Iterator. It's up to you to close the InputStreams when done with each one.". But as often this is not the whole truth. In truth it returns a XSSFReader.SheetIterator which has a method XSSFReader.SheetIterator.getSheetName to get the sheet names.

Example:

import java.io.InputStream;
import java.io.FileInputStream;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.XSSFReader;

import java.util.Iterator;

public class ExcelXSSFGetSheetNamesXSSFReader {

 public static void main(String[] args) throws Exception {

  OPCPackage pkg = OPCPackage.open(new FileInputStream("Example.xlsx"));
  XSSFReader r = new XSSFReader( pkg );
  Iterator<InputStream> sheets = r.getSheetsData();

  if (sheets instanceof XSSFReader.SheetIterator) {
   XSSFReader.SheetIterator sheetiterator = (XSSFReader.SheetIterator)sheets;

   while (sheetiterator.hasNext()) {
    InputStream dummy = sheetiterator.next();

    System.out.println(sheetiterator.getSheetName());

    dummy.close();
   }
  }

  pkg.close();
 }
}

Conclusion: Currently you cannot work with apache poi only by trusting the API documentation. Instead you must always have a look at the source code.