Slow XSSFWorkbook and WorkbookFactory when reading

2019-06-06 19:08发布

问题:

I've seen developers have had this problem since a few years ago. I have studied many forums and the official POI documents. Nonetheless I haven't found an answer yet. So the problem is.. I have tried the following two snippets:

Workbook wb = WorkbookFactory.create(new File("spreadsheet.xlsx"));

and

File file = new File("C:\\spreadsheet.xlsx");
OPCPackage opcPackage = OPCPackage.open(file.getAbsolutePath());
XSSFWorkbook workbook = new XSSFWorkbook(opcPackage);

and either of the approaches takes about 5-6min (if the application doesn't run out of memory) to process a simple and fairly small spreadsheet.xlsx file (200KB).

What do I need to do to fix this? (I'm using Apache POI 3.9)

/*****************************/

The process takes a long time in the following location:

public class XSSFSheet extends POIXMLDocumentPart implements Sheet{
...
protected void read(InputStream is) throws IOException {
    try {
      -->>> worksheet = WorksheetDocument.Factory.parse(is).getWorksheet();
    } catch (XmlException e){
        throw new POIXMLException(e);
    }
}
...

I can't debug further. The VisualVM also says the same thing..!

回答1:

One factor that might be contributing to the load time is that the data has been pasted into the worksheet so that the used range includes every row, ie when you use the sheet.usedrange rows count it returns > 1,000,000 rows.. Not sure how this happens but I found that I needed to perform an intermediary step wherein prior to loading the workbook I 'cleaned' it by using some vba script. The workbook has around 20 sheets of around 5000 rows each, each of which are filled out by different parts o the business, and it takes a fairly long time (maybe 4 minutes) to load but that is acceptable in this case. Before I added the cleaning stage it ran for over 30 minutes, which was not acceptable....

A user runs the process I am referring to, bu pressing two buttons. The first cleans, the second does the rest. The first process is triggered using Runtime.getruntime.exec and creates an empty text file that the second process will not run unless the test file is there.