Modifying existing xlsx spreadsheet using Apache P

2019-08-07 17:32发布

问题:

So the topic of unreadable content errors in Excel with Apache POI seems common. However, I was surprised that I still could not find an example of my issue, especially since it seems extremely simple what I'm trying to do. This leads me to believe something is probably just going over my head with my understanding of some java objects, specifically File and FileInputStream. I found out through lots of trial and error how to make it work, but my question is WHY one works and why the other does not, and would love to understand the underlying issue. I think it may help others to understand as well. Here's exactly what I'm trying to do:

Open an existing xlsx file, add a sheet to it, and save it to the same file name as its original. In essence, just modify an existing xlsx file to add a sheet.

Below is the code that does not work, and the code that does, and I"m wondering why using the File object does not work. In both examples, I have already created the TravelVouchers.xlsx file with a worksheet named Voucher_1 and it works fine to open it.

The following code leads to a NullPointerException on the travelVouchersWkBk.write(fileOut); line:

File travelVouchersFile = new File("./Output/TravelVouchers.xlsx");
Workbook travelVouchersWkBk = WorkbookFactory.create(travelVouchersFile);
travelVouchersWkBk.createSheet("Voucher_2");
FileOutputStream fileOut = new FileOutputStream(travelVouchersFile);
travelVouchersWkBk.write(fileOut);
fileOut.flush();
fileOut.close();

While the following code works great:

File travelVouchersFile = new File("./Output/TravelVouchers.xlsx");
FileInputStream fileIn = new FileInputStream(travelVouchersFile);
Workbook travelVouchersWkBk = WorkbookFactory.create(fileIn);
travelVouchersWkBk.createSheet("Voucher_2");
fileIn.close();
FileOutputStream fileOut = new FileOutputStream(travelVouchersFile);
travelVouchersWkBk.write(fileOut);
fileOut.flush();
fileOut.close();

In the code that doesn't work, it causes an error in excel that there is unreadable content, and when I choose to repair, I can open it and it only has Voucher_1. So obviously using WorkbookFactory.create with a File object doesn't work, where using a FileInputStream does, but I'm wondering what I'm not understanding about File vs FileInputStream as it relates to this issue.

Thanks a lot for the clarification, I really appreciate it! Paul

回答1:

I belive using a File with WorkbookFactory for reading a Workbook and then writing to the same Workbook-File is not a good idea until now. The apache POI documentation mentions:

Files vs InputStreams

When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file.

But why using a File object allows for lower memory consumption? It's because the Workbookfactory then creates a RandomAccessFile which needs not to be fully read in memory. To verify that please read the sources of WorkbookFactory.java -> public static Workbook create(File file, String password, boolean readOnly), NPOIFSFileSystem.java -> private NPOIFSFileSystem(FileChannel channel, File srcFile, boolean readOnly, boolean closeChannelOnError), FileBackedDataSource.java -> private static RandomAccessFile newSrcFile(File file, String mode).

But as far as I know there is no possibility to get access to this RandomAccessFile. So we can't write to it and also we can't close it without closing the whole Workbook. So in the example above the RandomAccessFile will be opened for reading and writing (rw mode) while the Workbook.write(FileOutputStream) tries writing the file.

So using a File may be good for reading only or reading from one file and saving to another file. But using a File for reading and writing the same file is not a good idea until now. For this FileInputStream and FileOutputStream are better.