How to speed up excel reading/writing

2019-08-01 12:10发布

I am using Apache POI to read/write to an excel file for my company as an intern here. My program goes through the excel file which is a big square with top rows computer names and left column user names. 240 computers and 342 users. the sheet[computer][user] is 0 in all spaces and the program calls PSLoggedon for each computer and takes the username(s) currently logged on and increments their 0 so after running it after a month, it shows who is logged in the most to each computer. So far it runs in about 25 minutes since I used a socket to check socket.connect before actually calling PSLoggedon.

Without reading or writing at all to the excel file, just calling all the PSLoggedon calls to each computer, takes about 9 minutes. So, the reading and writing apparently takes 10-15 minutes. The thing is, I am calling PSLoggedon on the computer, then opening the excel to find the [x][y] spot of the [computer][user] and then writing to it a +=1 then closing it. So the reason it is taking this long I suppose is because it opens and closes the file so much? I could be completely wrong. But I can't think of a way to make this faster by opening and reading/writing all at once and only opening and closing the file once. Any ideas?

3条回答
等我变得足够好
2楼-- · 2019-08-01 12:48
  1. Get HSQLDB (or another in-process database, but this is what I've used in the past). Add it to your build.
  2. You can now create either a file-based or in-memory database (I would use file-based, as it lets you persist state between runs) simply by using JDBC. Create a table with the columns User, Computer, Count
  3. In your reading thread(s), INSERT or UPDATE your table whenever you find a user with PSLoggedon
  4. Once your data collection is complete, you can SELECT Computer, User, Count from Data ORDER BY Computer, User (or switch the order depending on your excel file layout), loop through the ResultSet and write the results directly.
查看更多
祖国的老花朵
3楼-- · 2019-08-01 12:53

This is an old question, but from what I see:

  1. Since you are sampling and using Excel, is it safe to assume that consistency and atomicity isn't critical? You're just estimating fractional usage and don't care if a user logged in and logged out between observations.
  2. Is the Excel file stored over a slow network link? Opening and closing a file 240 times could bring significant overhead. How about the following:
  3. You need to open the Excel file once to get the list of computers. At that time, just snapshot the entire contents of the matrix into a Map<ComputerName, Map<UserName, Count>>. Also get a List<ComputerName> and List<UserName> to remember the row/column headings. The entire spreadsheet has less than 90,000 integers --- no need to bring in heavy database machinery.
  4. 9 minutes for 240 computers, single-threaded, is roughly 2.25 seconds per computer. Is that the expected throughput of PSLoggedOn? Can you create a thread pool and query all 240 computers at once or in a small number of rounds?
  5. Then, parse the results, increment your map and dump it back to the Excel file. Is there a possibility that you might see new users that were not previously in the Excel? Those will need to be added to the Map and List<UserName>.
查看更多
我想做一个坏孩纸
4楼-- · 2019-08-01 13:10

Normally Apache-POI is very fast, if you are running into some issue then you might need to check below points:

  1. POI's logging might be on, you need to turn them off: You can add one of these –D to your JVM settings to do this:

    -Dorg.apache.poi.util.POILogger=org.apache.poi.util.NullLogger

  2. You may be setting your VM heap to low value, try to increase.

  3. Prefer XLS over XLSX.

查看更多
登录 后发表回答