I need to auto-fit all rows in large (30k+ rows) xlsx file.
The following code via apache poi works on small files, but goes out with OutOfMemoryError
on large ones:
Workbook workbook = WorkbookFactory.create(inputStream);
Sheet sheet = workbook.getSheetAt(0);
for (Row row : sheet) {
row.setHeight((short) -1);
}
workbook.write(outputStream);
Update: Unfortunately, increasing heap size is not an option - OutOfMemoryError
appears at -Xmx1024m
and 30k rows is not an upper limit.
I used Event API for a HSSF file (.xls), and I discovered terrible lack of documentation about order of records.
I was having the same problem with a lot less of row, but large strings.
Since I don't have to keep my data loaded, I found out that I can use SXSSF instead of XSSF.
They have similar interfaces, which helps if you have a lot of code already writen. But with SXSSF it is possible to set the amount of rows you keep loaded.
Here is the link. http://poi.apache.org/spreadsheet/how-to.html#sxssf
If you are writing to XLSX, I found an improvement by writing to different sheets of the same Excel file. You also might find an improvement by writing to different Excel files. But first try writing to different sheets.
If you want to auto-fit or set styles or write all rows in large (30k+ rows) xlsx file,use SXSSFWorkbook.Here is the sample code that helps you...
A dramatic improvement in memory usage can be done by using a File instead of a Stream. (It is better to use a streaming API, but the Streaming API's have limitations, see http://poi.apache.org/spreadsheet/index.html)
So instead of
do
This is according to : http://poi.apache.org/spreadsheet/quick-guide.html#FileInputStream
Files vs InputStreams
"When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file."
The best example for this is described in the following stack overflow thread: Error While Reading Large Excel Files (xlsx) Via Apache POI
The code snippet in the main answer in that topic illustrates the Apache POI wrappings around SAX xml parsing, and how you can trivially loop over all the sheets and then over each individual cell.
The code is stale with current implementation of the Apache POI API, as the endRow() api provides the current row number that has finished to be processing.
With that code snippet it should be trivial for your to parse a big XLSX file cell by cell. E.g. for each sheet; for each row cell; row has ended event. You could trivial create app logic where at the of each row you create a Map of columneName to cellValue.