The situation is as follows;
I have a simple program which uses the Apache Poi Library to add one row of data at the end of the an exisiting xlsx file. See below
File file = new File(input);
XSSFWorkbook workbook = new XSSFWorkbook(file);
XSSFSheet sheet = workbook.getSheetAt(0);
XSSFRow row = sheet.createRow(sheet.getLastRowNum() + 1);
After this I will iterate over the row and set the CellValues. But the problem is that on the second line of the code, as shown above, I get an out of memory error. Is there a way to add a row of data to the existing xlsx file without having to read the file fully?
You can try XSSF and SAX (Event API).
If getting the XSSFWorkbook
fails because of out-of-memory error and the need is to read and write the workbook, then neither SXSSF
nor SAX
parser will help. The one is only for writing. The other is only for reading.
Both approaches in follow needs knowledge about the *.xlsx
file format which is Office Open XML. In general a *.xlsx
file is a ZIP
archive containing XML
files and other files in a special directory structure. So one can unzip the *.xlsx
file using a ZIP
software to have a look at the XML
files. The file format was first standardized by Ecma. So for further recherches I prefer Ecma Markup Language Reference. For example Row.
The ReadAndWriteTest.xlsx
used in both examples must have at least one worksheet and the first worksheet must have at least one row.
One approach could be using the DOM
methods of XMLBeans. My favorite reference for this is grepcode.
Example:
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.xssf.model.SharedStringsTable;
import java.io.File;
import java.io.OutputStream;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTSheetData;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCell;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellType;
import org.openxmlformats.schemas.officeDocument.x2006.relationships.STRelationshipId;
import org.apache.xmlbeans.XmlOptions;
import javax.xml.namespace.QName;
import java.util.Map;
import java.util.HashMap;
import java.util.regex.Pattern;
class DOMReadAndWriteTest {
public static void main(String[] args) {
try {
File file = new File("ReadAndWriteTest.xlsx");
//we only open the OPCPackage, we don't create a Workbook
OPCPackage opcpackage = OPCPackage.open(file);
//if there are strings in the SheetData, we need the SharedStringsTable
PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
SharedStringsTable sharedstringstable = new SharedStringsTable();
sharedstringstable.readFrom(sharedstringstablepart.getInputStream());
//get the PackagePart of the first sheet
PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0);
//get the worksheet from the first sheet's XML
//if it even fails while parsing this, then this approach is not usable
WorksheetDocument worksheetdocument = WorksheetDocument.Factory.parse(sheetpart.getInputStream());
CTWorksheet worksheet = worksheetdocument.getWorksheet();
CTSheetData sheetdata = worksheet.getSheetData();
//put some data in 10 new rows"
for (int i = 0; i < 10; i++) {
int rowsCount = sheetdata.sizeOfRowArray();
CTCell ctcell= sheetdata.addNewRow().addNewC();
CTRst ctstr = CTRst.Factory.newInstance();
ctstr.setT("new Row " + (rowsCount + 1));
int sRef = sharedstringstable.addEntry(ctstr);
ctcell.setT(STCellType.S);
ctcell.setV(Integer.toString(sRef));
ctcell=sheetdata.getRowArray(rowsCount).addNewC();
ctcell.setV(""+rowsCount+"."+(i+1)+""+((i+2>9)?0:i+2));
}
//write the SharedStringsTable
OutputStream out = sharedstringstablepart.getOutputStream();
sharedstringstable.writeTo(out);
out.close();
//create XmlOptions for saving the worksheet
XmlOptions xmlOptions = new XmlOptions();
xmlOptions.setSaveOuter();
xmlOptions.setUseDefaultNamespace();
xmlOptions.setSaveAggressiveNamespaces();
xmlOptions.setCharacterEncoding("UTF-8");
xmlOptions.setSaveSyntheticDocumentElement(new QName(CTWorksheet.type.getName().getNamespaceURI(), "worksheet"));
Map<String, String> map = new HashMap<String, String>();
map.put(STRelationshipId.type.getName().getNamespaceURI(), "r");
xmlOptions.setSaveSuggestedPrefixes(map);
//save the worksheet
out = sheetpart.getOutputStream();
worksheet.save(out, xmlOptions);
out.close();
opcpackage.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
This code writes 10 new Rows in sheet1 of ReadAndWriteTest.xlsx
without opening the whole workbook. But it must at least opening and parsing the sheet1 and the SharedStringsTable
. If even this fails, then this approach is not usable.
Another approach could be using StAX. This API can read and write XML event driven. And it uses streaming.
Example:
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;
import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.EndElement;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.XMLEvent;
import javax.xml.namespace.QName;
import java.io.File;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
class StaxReadAndWriteTest {
public static void main(String[] args) {
try {
File file = new File("ReadAndWriteTest.xlsx");
OPCPackage opcpackage = OPCPackage.open(file);
//if there are strings in the sheet data, we need the SharedStringsTable
//if it even fails while parsing this SharedStringsTable, then this approach is not usable
//then we must stream this XML event driven also.
PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
SharedStringsTable sharedstringstable = new SharedStringsTable();
sharedstringstable.readFrom(sharedstringstablepart.getInputStream());
PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0);
XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(sheetpart.getInputStream());
XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(sheetpart.getOutputStream());
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
int rowsCount = 0;
while(reader.hasNext()){ //loop over all XML in sheet1.xml
XMLEvent event = (XMLEvent)reader.next();
writer.add(event); //by default write each readed event
if(event.isStartElement()){
StartElement startElement = (StartElement)event;
QName startElementName = startElement.getName();
if(startElementName.getLocalPart().equalsIgnoreCase("row")) { //start element of row
boolean rowStart = true;
rowsCount++;
do {
event = (XMLEvent)reader.next(); //find this row's end
writer.add(event); //by default write each readed event
if(event.isEndElement()){
EndElement endElement = (EndElement)event;
QName endElementName = endElement.getName();
if(endElementName.getLocalPart().equalsIgnoreCase("row")) { //end element of row
rowStart = false;
//we assume that there is nothing else (character data) between end element of row and next element
XMLEvent nextElement = (XMLEvent)reader.peek();
QName nextElementName = null;
if (nextElement.isStartElement()) nextElementName = ((StartElement)nextElement).getName();
else if (nextElement.isEndElement()) nextElementName = ((EndElement)nextElement).getName();
if(!nextElementName.getLocalPart().equalsIgnoreCase("row")) { //next is not start element of row
//we have the last row, so we write new rows now
for (int i = 0; i < 10; i++) {
StartElement newRowStart = eventFactory.createStartElement(new QName("row"), null, null);
writer.add(newRowStart);
//start cell A
Attribute attribute = eventFactory.createAttribute("t", "s");
List attributeList = Arrays.asList(attribute);
StartElement newCellStart = eventFactory.createStartElement(new QName("c"), attributeList.iterator(), null);
writer.add(newCellStart);
CTRst ctstr = CTRst.Factory.newInstance();
ctstr.setT("new Row " + (rowsCount +1));
int sRef = sharedstringstable.addEntry(ctstr);
StartElement newCellValue = eventFactory.createStartElement(new QName("v"), null, null);
writer.add(newCellValue);
Characters value = eventFactory.createCharacters(Integer.toString(sRef));
writer.add(value);
EndElement newCellValueEnd = eventFactory.createEndElement(new QName("v"), null);
writer.add(newCellValueEnd);
EndElement newCellEnd = eventFactory.createEndElement(new QName("c"), null);
writer.add(newCellEnd);
//end cell A
//start cell B
newCellStart = eventFactory.createStartElement(new QName("c"), null, null);
writer.add(newCellStart);
newCellValue = eventFactory.createStartElement(new QName("v"), null, null);
writer.add(newCellValue);
value = eventFactory.createCharacters(""+rowsCount+"."+(i+1)+""+((i+2>9)?0:i+2));
writer.add(value);
newCellValueEnd = eventFactory.createEndElement(new QName("v"), null);
writer.add(newCellValueEnd);
newCellEnd = eventFactory.createEndElement(new QName("c"), null);
writer.add(newCellEnd);
//end cell B
EndElement newRowEnd = eventFactory.createEndElement(new QName("row"), null);
writer.add(newRowEnd);
rowsCount++;
}
}
}
}
} while (rowStart);
}
}
}
writer.flush();
//write the SharedStringsTable
OutputStream out = sharedstringstablepart.getOutputStream();
sharedstringstable.writeTo(out);
out.close();
opcpackage.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
This code also writes 10 new Rows in sheet1 of ReadAndWriteTest.xlsx
without opening the whole workbook. But it must at least opening and parsing the SharedStringsTable
. If even this fails, then this approach is also not usable. But of course even the SharedStringsTable
could be streamed using StAX. But as you see in example with generating the rows and cells, this is much more complicated. So using the SharedStringsTable
makes things easier in this example.
(not enough reputation to add this as a comment)
Have you tried using SXSSFWorkbook instead of XSSFWorkbook?