java.lang.OutOfMemoryError: GC overhead limit exce

2019-06-03 14:42发布

问题:

I understand what the error means, that my program is consuming too much memory and for a long period of the time it is not recovering.

My program is just reading 6,2Mb xlsx file when the memory issue occures.

When I try to monitor the program, it very quickly reaches 1,2Gb in memory consumption and then it crashes. How can it reach 1,2Gb when reading 6,2Mb file?

Is there a way to open the file in chunks? So that it doesn't have to be loaded to the memory? Or any other solution?

Exactly this part causes it. But since it is a library, shouldn't it be handled somehow smartly? It is only 200 000 rows with only 3 columns. For future, I need it to work with approx. 1 mil records and more columns...

CODE:

  Workbook myWorkBook;
        Sheet mySheet;
        if (filePath.contains(".xlsx")) {
            // Finds the workbook instance for XLSX file
             myWorkBook = new XSSFWorkbook(fis);
            // Return first sheet from the XLSX workbook
             mySheet = myWorkBook.getSheetAt(0);
             myWorkBook.close(); // Should I close myWorkBook before I get data from it?
        } 

回答1:

If you wish to work with large XLSX files, you need to use the streaming XSSFReader class. Since the data is XML, you can use StAX to effectively process the contents.

Here's (one way) how to get the Inputstream from the xlsx.

OPCPackage opc = OPCPackage.open(file);
XSSFReader xssfReader = new XSSFReader(opc);
SharedStringsTable sst = xssfReader.getSharedStringsTable();
XSSFReader.SheetIterator itr = (XSSFReader.SheetIterator)xssfReader.getSheetsData();
while(itr.hasNext()) {
    InputStream sheetStream = itr.next();
    if(itr.getSheetName().equals(sheetName)) {  // Or you can keep track of sheet numbers
        in = sheetStream;
        return;
    } else {
        sheetStream.close();
    }
}

The elements are <row>, and <c> (for cell). You can create a small xlsx file, unzip it and examine the XML inside for more information.

Edit: There are some examples on processing the data with SAX, but using StAX is a lot nicer and just as efficient.