Processing large xlsx file

2020-01-24 04:14发布

I need to auto-fit all rows in large (30k+ rows) xlsx file.

The following code via apache poi works on small files, but goes out with OutOfMemoryError on large ones:

Workbook workbook = WorkbookFactory.create(inputStream);
Sheet sheet = workbook.getSheetAt(0);

for (Row row : sheet) {
    row.setHeight((short) -1);
}

workbook.write(outputStream);

Update: Unfortunately, increasing heap size is not an option - OutOfMemoryError appears at -Xmx1024m and 30k rows is not an upper limit.

9条回答
劳资没心,怎么记你
2楼-- · 2020-01-24 04:21

I used Event API for a HSSF file (.xls), and I discovered terrible lack of documentation about order of records.

查看更多
相关推荐>>
3楼-- · 2020-01-24 04:22

I was having the same problem with a lot less of row, but large strings.

Since I don't have to keep my data loaded, I found out that I can use SXSSF instead of XSSF.

They have similar interfaces, which helps if you have a lot of code already writen. But with SXSSF it is possible to set the amount of rows you keep loaded.

Here is the link. http://poi.apache.org/spreadsheet/how-to.html#sxssf

查看更多
倾城 Initia
4楼-- · 2020-01-24 04:30

If you are writing to XLSX, I found an improvement by writing to different sheets of the same Excel file. You also might find an improvement by writing to different Excel files. But first try writing to different sheets.

查看更多
你好瞎i
5楼-- · 2020-01-24 04:31

If you want to auto-fit or set styles or write all rows in large (30k+ rows) xlsx file,use SXSSFWorkbook.Here is the sample code that helps you...

SXSSFWorkbook wb = new SXSSFWorkbook();
            SXSSFSheet sheet = (SXSSFSheet) wb.createSheet("writetoexcel");
            Font font = wb.createFont();
                font.setBoldweight((short) 700);
                // Create Styles for sheet.
                XSSFCellStyle Style = (XSSFCellStyle) wb.createCellStyle();
                Style.setFillForegroundColor(new XSSFColor(java.awt.Color.LIGHT_GRAY));
                Style.setFillPattern(XSSFCellStyle.SOLID_FOREGROUND);
                Style.setFont(font);
                //iterating r number of rows
            for (int r=0;r < 30000; r++ )
            {
                Row row = sheet.createRow(r);
                //iterating c number of columns
                for (int c=0;c < 75; c++ )
                {
                    Cell cell = row.createCell(c);
                    cell.setCellValue("Hello"); 
                    cell.setCellStyle(Style);
                }
    }
            FileOutputStream fileOut = new FileOutputStream("E:" + File.separator + "NewTest.xlsx");
查看更多
闹够了就滚
6楼-- · 2020-01-24 04:35

A dramatic improvement in memory usage can be done by using a File instead of a Stream. (It is better to use a streaming API, but the Streaming API's have limitations, see http://poi.apache.org/spreadsheet/index.html)

So instead of

Workbook workbook = WorkbookFactory.create(inputStream);

do

Workbook workbook = WorkbookFactory.create(new File("yourfile.xlsx"));

This is according to : http://poi.apache.org/spreadsheet/quick-guide.html#FileInputStream

Files vs InputStreams

"When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file."

查看更多
forever°为你锁心
7楼-- · 2020-01-24 04:36

The best example for this is described in the following stack overflow thread: Error While Reading Large Excel Files (xlsx) Via Apache POI

The code snippet in the main answer in that topic illustrates the Apache POI wrappings around SAX xml parsing, and how you can trivially loop over all the sheets and then over each individual cell.

The code is stale with current implementation of the Apache POI API, as the endRow() api provides the current row number that has finished to be processing.

With that code snippet it should be trivial for your to parse a big XLSX file cell by cell. E.g. for each sheet; for each row cell; row has ended event. You could trivial create app logic where at the of each row you create a Map of columneName to cellValue.

查看更多
登录 后发表回答