Apache POI Excel workbook creation taking a long t

2019-06-25 13:19发布

问题:

I've noticed that the workbook creation statement for xlsx files with Apache POI v3.10 e.g. `

Workbook wb = WorkbookFactory.create(inputStream) 

or

Workbook wb = new XSSFWorkbook(inputStream)

...is taking a long time (~30 seconds) and the file only has 72 rows with 10 columns (365KB).

It's not a problem, but it just seems a bit excessive. I'm wondering if I'm doing anything wrong or not doing something I should be doing. Instantiation of an xls file with the same data (but only 25KB) only takes 1 or 2 seconds. If this is normal, then could someone just let me know.

Edit:

This is the workbook creation code I'm using:

LOG.info("Loading Excel Workbook...");
Workbook workbook;
try {
    workbook = WorkbookFactory.create(dataStream);
} catch (InvalidFormatException e) {
    throw new IOException("Invalid file format ==> " + e.getMessage());
}
LOG.info("Workbook loaded.");

Just to be clear, dataStream is an InputStream. The 30 second delay occurs between the first and second log statements. As I said previously, I've tried replacing the factory with new XSSFWorkbook(dataStream) but the delay remains.

Edit-2:

I ran a standalone test which does nothing except the workbook initialization using 1) a File, and also 2) an InputStream where the source is the xlsx file I've been having trouble with. They both completed in ~2 seconds.

I should have added some background earlier. I'm using the Google App Engine. The input stream that I'm giving to POI is retrieved from a file upload to the server. App Engine doesn't support Servlet 3.0 (for handling file uploads) so I have to use Apache Commons FileUpload lib to retrieve the file data. Ultimately, the data I get is an InputStream retrieved from FileItemStream#openStream(). This is what I supply to POI.

So, I don't know if this is a problem with the App Engine, or if POI doesn't like the flavor of the InputStream that FileItemStream is returning. Incidentally, I cannot try the initialization with a File instead of a InputStream because App Engine doesn't allow writes to the file system.

回答1:

I would do some profiling using one of the available profiling tools, e.g. JVisualVM, Dynatrace, JProfiler, ..

Only then you know for sure where the time is spent in your code, it might be some unexpected place after all and you would be chasing after the wrong horse here.

I.e. you might receive the InputStream from somewhere else and it might be actually a download from some external content via the Internet and the line might be slow and thus all the reading just takes ages. Or it might be something with the disk-setup or memory shortage where lots of GC is running because you are near the limit, ...

One other option would be to extract the smallest possible snippet of code which reproduces this, then you can see what else you need to remove to make it run faster.



回答2:

Use csv format instead. We had the same issue using POI library and we are now downloading the csv file instead of an excel file, which is very light and is downloaded quickly.