我想读大Excel文件通过Apache POI XLSX,说40-50 MB。 我走出内存异常。 当前的堆存储器是3GB。
我可以读较小的Excel文件没有任何问题。 我需要一种方法来读取大型Excel文件,然后将它们早在通过Spring的Excel视图响应。
public class FetchExcel extends AbstractView {
@Override
protected void renderMergedOutputModel(
Map model, HttpServletRequest request, HttpServletResponse response)
throws Exception {
String fileName = "SomeExcel.xlsx";
response.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
OPCPackage pkg = OPCPackage.open("/someDir/SomeExcel.xlsx");
XSSFWorkbook workbook = new XSSFWorkbook(pkg);
ServletOutputStream respOut = response.getOutputStream();
pkg.close();
workbook.write(respOut);
respOut.flush();
workbook = null;
response.setHeader("Content-disposition", "attachment;filename=\"" +fileName+ "\"");
}
}
我首先开始关断使用XSSFWorkbook workbook = new XSSFWorkbook(FileInputStream in);
但是这是每Apache的POI API昂贵,所以我切换到OPC包的方式,但还是同样的效果。 我并不需要解析或处理文件,只需阅读并返回。
你没有提到是否需要修改电子表格或没有。
这可能是显而易见的,但如果你不需要在修改电子表格,那么你并不需要分析它,并把它写回退,你可以简单地从文件中读取字节,并写出字节,就像使用,说的图像,或任何其它的二进制格式。
如果确实需要将其发送给用户,然后就我所知,之前修改电子表格,您可能需要采取不同的方法。
据我所知的在Java中读取Excel文件中的每个库读取整个电子表格到内存中,所以你必须有可为可能会被同时处理的每电子表格50MB内存。 这涉及到,正如其他人所指出的那样,调整供VM堆。
如果你需要同时处理大量的电子表格,并且无法分配足够的内存,可以考虑使用可流的格式,而不是在一次读取所有到内存中。 CSV格式可以通过Excel打开,我已经通过内容类型设置为application / vnd.ms - Excel中,附件文件名设置为在“.xls的”结尾的东西在过去良好的效果,但实际上返回CSV内容。 我没有在几年尝试过这一点,所以因人而异。
下面是一个使用SAX解析器读取大型xls文件的例子。
public void parseExcel(File file) throws IOException {
OPCPackage container;
try {
container = OPCPackage.open(file.getAbsolutePath());
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(container);
XSSFReader xssfReader = new XSSFReader(container);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
while (iter.hasNext()) {
InputStream stream = iter.next();
processSheet(styles, strings, stream);
stream.close();
}
} catch (InvalidFormatException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (OpenXML4JException e) {
e.printStackTrace();
}
}
protected void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, InputStream sheetInputStream) throws IOException, SAXException {
InputSource sheetSource = new InputSource(sheetInputStream);
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
try {
SAXParser saxParser = saxFactory.newSAXParser();
XMLReader sheetParser = saxParser.getXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(styles, strings, new SheetContentsHandler() {
@Override
public void startRow(int rowNum) {
}
@Override
public void endRow() {
}
@Override
public void cell(String cellReference, String formattedValue) {
}
@Override
public void headerFooter(String text, boolean isHeader, String tagName) {
}
},
false//means result instead of formula
);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch (ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
在bellwo例子中,我将添加一个完整的代码如何分析一个完整的Excel文件(对我来说60Mo)为对象的名单没有的“ 内存不足 ” 的任何问题,并且很好地工作:
import java.util.ArrayList;
import java.util.List;
class DistinctByProperty {
private static OPCPackage xlsxPackage = null;
private static PrintStream output= System.out;
private static List<MassUpdateMonitoringRow> resultMapping = new ArrayList<>();
public static void main(String[] args) throws IOException {
File file = new File("C:\\Users\\aberguig032018\\Downloads\\your_excel.xlsx");
double bytes = file.length();
double kilobytes = (bytes / 1024);
double megabytes = (kilobytes / 1024);
System.out.println("Size "+megabytes);
parseExcel(file);
}
public static void parseExcel(File file) throws IOException {
try {
xlsxPackage = OPCPackage.open(file.getAbsolutePath(), PackageAccess.READ);
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(xlsxPackage);
XSSFReader xssfReader = new XSSFReader(xlsxPackage);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
int index = 0;
while (iter.hasNext()) {
try (InputStream stream = iter.next()) {
String sheetName = iter.getSheetName();
output.println();
output.println(sheetName + " [index=" + index + "]:");
processSheet(styles, strings, new MappingFromXml(resultMapping), stream);
}
++index;
}
} catch (InvalidFormatException e) {
e.printStackTrace();
} catch (OpenXML4JException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
}
}
private static void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, MappingFromXml mappingFromXml, InputStream sheetInputStream) throws IOException, SAXException {
DataFormatter formatter = new DataFormatter();
InputSource sheetSource = new InputSource(sheetInputStream);
try {
XMLReader sheetParser = SAXHelper.newXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(
styles, null, strings, mappingFromXml, formatter, false);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
System.out.println("Size of Array "+resultMapping.size());
} catch(ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
}
}
你必须添加一个CALSS实现
SheetContentsHandler
import com.sun.org.apache.xpath.internal.operations.Bool;
import org.apache.poi.ss.util.CellAddress;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;
import org.apache.poi.xssf.usermodel.XSSFComment;
import java.io.PrintStream;
import java.util.ArrayList;
import java.util.List;
public class MappingFromXml implements SheetContentsHandler {
private List<myObject> result = new ArrayList<>();
private myObject myObject = null;
private int lineNumber = 0;
/**
* Number of columns to read starting with leftmost
*/
private int minColumns = 25;
/**
* Destination for data
*/
private PrintStream output = System.out;
public MappingFromXml(List<myObject> list) {
this.result = list;
}
@Override
public void startRow(int i) {
output.println("iii " + i);
lineNumber = i;
myObject = new myObject();
}
@Override
public void endRow(int i) {
output.println("jjj " + i);
result.add(myObject);
myObject = null;
}
@Override
public void cell(String cellReference, String formattedValue, XSSFComment comment) {
int columnIndex = (new CellReference(cellReference)).getCol();
if(lineNumber > 0){
switch (columnIndex) {
case 0: {//Tech id
if (formattedValue != null && !formattedValue.isEmpty())
myObject.setId(Integer.parseInt(formattedValue));
}
break;
//TODO add other cell
}
}
}
@Override
public void headerFooter(String s, boolean b, String s1) {
}
}
对于此介绍浏览更多信息链接
我也遇到OOM的同样的问题在解析XLSX文件...斗争后两天,我终于找到了下面的代码,那真是完美的;
这个代码是基于sjxlsx。 它读取在HSSF片的XLSX和商店。
[code=java]
// read the xlsx file
SimpleXLSXWorkbook = new SimpleXLSXWorkbook(new File("C:/test.xlsx"));
HSSFWorkbook hsfWorkbook = new HSSFWorkbook();
org.apache.poi.ss.usermodel.Sheet hsfSheet = hsfWorkbook.createSheet();
Sheet sheetToRead = workbook.getSheet(0, false);
SheetRowReader reader = sheetToRead.newReader();
Cell[] row;
int rowPos = 0;
while ((row = reader.readRow()) != null) {
org.apache.poi.ss.usermodel.Row hfsRow = hsfSheet.createRow(rowPos);
int cellPos = 0;
for (Cell cell : row) {
if(cell != null){
org.apache.poi.ss.usermodel.Cell hfsCell = hfsRow.createCell(cellPos);
hfsCell.setCellType(org.apache.poi.ss.usermodel.Cell.CELL_TYPE_STRING);
hfsCell.setCellValue(cell.getValue());
}
cellPos++;
}
rowPos++;
}
return hsfSheet;[/code]