While googling, I see that using java.io.File#length()
can be slow.
FileChannel
has a size()
method that is available as well.
Is there an efficient way in java to get the file size?
While googling, I see that using java.io.File#length()
can be slow.
FileChannel
has a size()
method that is available as well.
Is there an efficient way in java to get the file size?
Actually, I think the "ls" may be faster. There are definitely some issues in Java dealing with getting File info. Unfortunately there is no equivalent safe method of recursive ls for Windows. (cmd.exe's DIR /S can get confused and generate errors in infinite loops)
On XP, accessing a server on the LAN, it takes me 5 seconds in Windows to get the count of the files in a folder (33,000), and the total size.
When I iterate recursively through this in Java, it takes me over 5 minutes. I started measuring the time it takes to do file.length(), file.lastModified(), and file.toURI() and what I found is that 99% of my time is taken by those 3 calls. The 3 calls I actually need to do...
The difference for 1000 files is 15ms local versus 1800ms on server. The server path scanning in Java is ridiculously slow. If the native OS can be fast at scanning that same folder, why can't Java?
As a more complete test, I used WineMerge on XP to compare the modified date, and size of the files on the server versus the files locally. This was iterating over the entire directory tree of 33,000 files in each folder. Total time, 7 seconds. java: over 5 minutes.
So the original statement and question from the OP is true, and valid. Its less noticeable when dealing with a local file system. Doing a local compare of the folder with 33,000 items takes 3 seconds in WinMerge, and takes 32 seconds locally in Java. So again, java versus native is a 10x slowdown in these rudimentary tests.
Java 1.6.0_22 (latest), Gigabit LAN, and network connections, ping is less than 1ms (both in the same switch)
Java is slow.
All the test cases in this post are flawed as they access the same file for each method tested. So disk caching kicks in which tests 2 and 3 benefit from. To prove my point I took test case provided by GHAD and changed the order of enumeration and below are the results.
Looking at result I think File.length() is the winner really.
Order of test is the order of output. You can even see the time taken on my machine varied between executions but File.Length() when not first, and incurring first disk access won.
Well, I tried to measure it up with the code below:
For runs = 1 and iterations = 1 the URL method is fastest most times followed by channel. I run this with some pause fresh about 10 times. So for one time access, using the URL is the fastest way I can think of:
For runs = 5 and iterations = 50 the picture draws different.
File must be caching the calls to the filesystem, while channels and URL have some overhead.
Code: