While googling, I see that using java.io.File#length()
can be slow.
FileChannel
has a size()
method that is available as well.
Is there an efficient way in java to get the file size?
While googling, I see that using java.io.File#length()
can be slow.
FileChannel
has a size()
method that is available as well.
Is there an efficient way in java to get the file size?
The benchmark given by GHad measures lots of other stuff (such as reflection, instantiating objects, etc.) besides getting the length. If we try to get rid of these things then for one call I get the following times in microseconds:
For 100 runs and 10000 iterations I get:
I did run the following modified code giving as an argument the name of a 100MB file.
I ran into this same issue. I needed to get the file size and modified date of 90,000 files on a network share. Using Java, and being as minimalistic as possible, it would take a very long time. (I needed to get the URL from the file, and the path of the object as well. So its varied somewhat, but more than an hour.) I then used a native Win32 executable, and did the same task, just dumping the file path, modified, and size to the console, and executed that from Java. The speed was amazing. The native process, and my string handling to read the data could process over 1000 items a second.
So even though people down ranked the above comment, this is a valid solution, and did solve my issue. In my case I knew the folders I needed the sizes of ahead of time, and I could pass that in the command line to my win32 app. I went from hours to process a directory to minutes.
The issue did also seem to be Windows specific. OS X did not have the same issue and could access network file info as fast as the OS could do so.
Java File handling on Windows is terrible. Local disk access for files is fine though. It was just network shares that caused the terrible performance. Windows could get info on the network share and calculate the total size in under a minute too.
--Ben
When I modify your code to use a file accessed by an absolute path instead of a resource, I get a different result (for 1 run, 1 iteration, and a 100,000 byte file -- times for a 10 byte file are identical to 100,000 bytes)
LENGTH sum: 33, per Iteration: 33.0
CHANNEL sum: 3626, per Iteration: 3626.0
URL sum: 294, per Iteration: 294.0
From GHad's benchmark, there are a few issue people have mentioned:
1>Like BalusC mentioned: stream.available() is flowed in this case.
Because available() returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream.
So 1st to remove the URL this approach.
2>As StuartH mentioned - the order the test run also make the cache difference, so take that out by run the test separately.
Now start test:
When CHANNEL one run alone:
When LENGTH one run alone:
So looks like the LENGTH one is the winner here:
If you want the file size of multiple files in a directory, use
Files.walkFileTree
. You can obtain the size from theBasicFileAttributes
that you'll receive.This is much faster then calling
.length()
on the result ofFile.listFiles()
or usingFiles.size()
on the result ofFiles.newDirectoryStream()
. In my test cases it was about 100 times faster.In response to rgrig's benchmark, the time taken to open/close the FileChannel & RandomAccessFile instances also needs to be taken into account, as these classes will open a stream for reading the file.
After modifying the benchmark, I got these results for 1 iterations on a 85MB file:
For 10000 iterations on same file:
If all you need is the file size, file.length() is the fastest way to do it. If you plan to use the file for other purposes like reading/writing, then RAF seems to be a better bet. Just don't forget to close the file connection :-)