Java Apache-poi, memory leak with excel files

I need to read (15000) excel files for my thesis. I'm using apache poi to open and later to analyze them but after around 5000 files I'm getting the following exception and stacktrace:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3044)
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3065)
at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3263)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1822)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseTagNS(PiccoloLexer.java:1362)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4682)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1277)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1264)
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:92)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument$Factory.parse(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:173)
at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:165)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.parseSheet(XSSFWorkbook.java:417)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:382)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:178)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:249)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:302)
at de.spreadsheet_realtions.analysis.WorkbookAnalysis.analyze(WorkbookAnalysis.java:18)

Code (at the moment just open the file and close the file):

public static void main(String[] args) {
    start();
}

public void start(){
    File[] files = getAllFiles(Config.folder);
    ZipSecureFile.setMinInflateRatio(0.00);
    for(File f: files){
        analyze(f);
    }
}

public void analyze(File file){
    Workbook  workbook = null;
    try {
        workbook = new XSSFWorkbook(file); //line 18
    } catch (Exception e1) {e1.printStackTrace(); return;}
//      later would be here the code to analyze the workbook
    try {
        workbook.close();
    } catch (Exception e) {e.printStackTrace();}
}

I tried also with OPCPackage.open(file) and I got the same result.

What I'm doing wrong or what can I do to solve this problem? Thanks for any help.

EDIT: The same for the code below.

try (XSSFWorkbook workbook = new XSSFWorkbook(file)){
} catch (Exception e1) {e1.printStackTrace(); return;}

标签： java excel memory-leaks apache-poi

2条回答

Anthone

2楼-- · 2019-05-11 15:40

In the case of an exception in your first try block, you return, so you wouldn't close the workbook.

Put the close in a finally block.

Workbook workbook = null;
try {
  workbook = new XSSFWorkbook(file); //line 18

  // later would be here the code to analyze the workbook
} catch (Exception e1) {
  e1.printStackTrace(); return;
}  finally {
  if (workbook != null) workbook.close();
}

Or, better, use try-with-resources.

try (XSSFWorkbook workbook = new XSSFWorkbook(file) {
  // later would be here the code to analyze
} catch (Exception e1) {
  e1.printStackTrace();
}
// No need for explicit close.

0人赞添加讨论(0) 举报

劳资没心，怎么记你

3楼-- · 2019-05-11 15:52

Typically, POI has the whole workbook in memory. So, a large workbook requires a different approach.

While writing, one can use SXSSF and most calls are the same, except that only a certain number of rows are in memory.

In your case, you are reading. For this you can use their "event driven" API. The basic idea here is that you do not get the workbook as one huge object. Instead, you get it piecemeal, as it is read, and you can save off as much as you wish into your own data-structure. Or, you can simply process it as you read it and not save very much.

Since this is a lower-level API (driven by the structure of the data being read), there is one approach for XLS and a different approach for XLSX. Look at the POI "How To" page, and find the section titled "XSSF and SAX (Event API)".

That example demonstrates how to detect the value of each cell as it is read in. (You'll need the xercesImpl.jar on your library path.)

0人赞添加讨论(0) 举报

Java Apache-poi, memory leak with excel files

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间