How do i read and edit huge excel files using POI?

2019-08-02 06:14发布

问题:

I have a requirement to do the following

1)Copy a huge excel file 1400*1400 and make a copy.

2)Read the copied file and add new columns and rows and also edit at the same time.

3)This is going to be a standalone program and not on a server. I have limitations of having low memory footprint and fast performance.

I have done some reading and have found the following

1)There is no API to copy sucg a huge file

2)SXSSF can be using for writing but not for reading

3)XSSF and SAX (Event API) can be using for reading but not for editing.If i tried to read and store as objects again i will have a memory issue.

Please can you help on how i can do this?

回答1:

Assuming your memory size is large enough to use XSSF/SAX to read and SXSSF to write, let me suggest the following solution.

1) Read the file using XSSF/SAX. For each row, create an object with the row data and immediately write it out into a file using ObjectOutputStream or any other output format you find convenient. You will create a separate file for each row. And there will only be 1 row object in memory, because you can keep modifying the same object with each row's data.

2) Make whatever modifications you need to. For rows that need to be modified, read the corresponding file back into your row object, modify as needed, and write it back out. For new rows, simply set the data in your row object and write it out to a new file.

3) Use SXSSF to reassemble your spreadsheet by reading 1 row object file at a time and storing it in your output spreadsheet.

That way, you will only have 1 row in memory at a time.



回答2:

If there is much data due to which 'Out of Memory' or 'GC overlimit exceeded' occurs and if memory is a problem the data can be initially parsed to a xml file. The excel sheet can be replaced with the xml file so that memory usage will be minimum.

In excel the sheets are represented as xml. Using java.util.zip.ZipFile each entries can be identified. The xml for the sheet can be replaced with the parsed xml so that we get the expected data in excel sheet.

Following class can be used to create xml files:

public class XmlSpreadsheetWriter {
    private final Writer _out;
    private int _rownum;

    public XmlSpreadsheetWriter(Writer out){
        _out = out;
    }

    public void beginSheet() throws IOException {
        _out.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
                "<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\">" );
        _out.write("<sheetData>\n");
    }

    public void endSheet() throws IOException {
        _out.write("</sheetData>");
        _out.write("</worksheet>");
    }

    public void insertRow(int rownum) throws IOException {
        _out.write("<row r=\""+(rownum+1)+"\">\n");
        this._rownum = rownum;
    }

    public void endRow() throws IOException {
        _out.write("</row>\n");
    }

    public void createCell(int columnIndex, String value, int styleIndex) throws IOException {
     String ref = new CellReference(_rownum, columnIndex).formatAsString();
     _out.write("<c r=\""+ref+"\" t=\"inlineStr\"");
     _out.write(" s=\""+styleIndex+"\"");
     _out.write(">");
     _out.write("<is><t>"+value+"</t></is>");
     _out.write("</c>");
    }

    public void createCell(int columnIndex, double value, int styleIndex) throws IOException {
     String ref = new CellReference(_rownum, columnIndex).formatAsString();
     _out.write("<c r=\""+ref+"\" t=\"n\"");
     _out.write(" s=\""+styleIndex+"\"");
     _out.write(">");
     _out.write("<v>"+value+"</v>");
     _out.write("</c>");
    }

    public void createEmptyCell(int columnIndex, int styleIndex)throws IOException {
     String ref = new CellReference(_rownum, columnIndex).formatAsString();
     _out.write("<c r=\""+ref+"\" t=\"n\"");
     _out.write(" s=\""+styleIndex+"\"");
     _out.write(">");
     _out.write("<v></v>");
     _out.write("</c>");
    }
} 


回答3:

If memory is the problem with processing the number of records you pointed out (i.e. 1400*1400 ) then getting XML data and processing those might be a solution for you. I know it may not be the best solution but it will for sure address the low memory requirement that you have. Even POI site points this solution too:

"If memory footprint is an issue, then for XSSF, you can get at the underlying XML data, and process it yourself. This is intended for intermediate developers who are willing to learn a little bit of low level structure of .xlsx files, and who are happy processing XML in java. Its relatively simple to use, but requires a basic understanding of the file structure. The advantage provided is that you can read a XLSX file with a relatively small memory footprint."

source:http://poi.apache.org/spreadsheet/how-to.html



标签: apache-poi