java get file size efficiently

2019-01-02 20:12发布

While googling, I see that using java.io.File#length() can be slow. FileChannel has a size() method that is available as well.

Is there an efficient way in java to get the file size?

标签: java filesize
9条回答
骚的不知所云
2楼-- · 2019-01-02 20:22

The benchmark given by GHad measures lots of other stuff (such as reflection, instantiating objects, etc.) besides getting the length. If we try to get rid of these things then for one call I get the following times in microseconds:

   file sum___19.0, per Iteration___19.0
    raf sum___16.0, per Iteration___16.0
channel sum__273.0, per Iteration__273.0

For 100 runs and 10000 iterations I get:

   file sum__1767629.0, per Iteration__1.7676290000000001
    raf sum___881284.0, per Iteration__0.8812840000000001
channel sum___414286.0, per Iteration__0.414286

I did run the following modified code giving as an argument the name of a 100MB file.

import java.io.*;
import java.nio.channels.*;
import java.net.*;
import java.util.*;

public class FileSizeBench {

  private static File file;
  private static FileChannel channel;
  private static RandomAccessFile raf;

  public static void main(String[] args) throws Exception {
    int runs = 1;
    int iterations = 1;

    file = new File(args[0]);
    channel = new FileInputStream(args[0]).getChannel();
    raf = new RandomAccessFile(args[0], "r");

    HashMap<String, Double> times = new HashMap<String, Double>();
    times.put("file", 0.0);
    times.put("channel", 0.0);
    times.put("raf", 0.0);

    long start;
    for (int i = 0; i < runs; ++i) {
      long l = file.length();

      start = System.nanoTime();
      for (int j = 0; j < iterations; ++j)
        if (l != file.length()) throw new Exception();
      times.put("file", times.get("file") + System.nanoTime() - start);

      start = System.nanoTime();
      for (int j = 0; j < iterations; ++j)
        if (l != channel.size()) throw new Exception();
      times.put("channel", times.get("channel") + System.nanoTime() - start);

      start = System.nanoTime();
      for (int j = 0; j < iterations; ++j)
        if (l != raf.length()) throw new Exception();
      times.put("raf", times.get("raf") + System.nanoTime() - start);
    }
    for (Map.Entry<String, Double> entry : times.entrySet()) {
        System.out.println(
            entry.getKey() + " sum: " + 1e-3 * entry.getValue() +
            ", per Iteration: " + (1e-3 * entry.getValue() / runs / iterations));
    }
  }
}
查看更多
弹指情弦暗扣
3楼-- · 2019-01-02 20:24

I ran into this same issue. I needed to get the file size and modified date of 90,000 files on a network share. Using Java, and being as minimalistic as possible, it would take a very long time. (I needed to get the URL from the file, and the path of the object as well. So its varied somewhat, but more than an hour.) I then used a native Win32 executable, and did the same task, just dumping the file path, modified, and size to the console, and executed that from Java. The speed was amazing. The native process, and my string handling to read the data could process over 1000 items a second.

So even though people down ranked the above comment, this is a valid solution, and did solve my issue. In my case I knew the folders I needed the sizes of ahead of time, and I could pass that in the command line to my win32 app. I went from hours to process a directory to minutes.

The issue did also seem to be Windows specific. OS X did not have the same issue and could access network file info as fast as the OS could do so.

Java File handling on Windows is terrible. Local disk access for files is fine though. It was just network shares that caused the terrible performance. Windows could get info on the network share and calculate the total size in under a minute too.

--Ben

查看更多
余生请多指教
4楼-- · 2019-01-02 20:25

When I modify your code to use a file accessed by an absolute path instead of a resource, I get a different result (for 1 run, 1 iteration, and a 100,000 byte file -- times for a 10 byte file are identical to 100,000 bytes)

LENGTH sum: 33, per Iteration: 33.0

CHANNEL sum: 3626, per Iteration: 3626.0

URL sum: 294, per Iteration: 294.0

查看更多
妖精总统
5楼-- · 2019-01-02 20:25

From GHad's benchmark, there are a few issue people have mentioned:

1>Like BalusC mentioned: stream.available() is flowed in this case.

Because available() returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream.

So 1st to remove the URL this approach.

2>As StuartH mentioned - the order the test run also make the cache difference, so take that out by run the test separately.


Now start test:

When CHANNEL one run alone:

CHANNEL sum: 59691, per Iteration: 238.764

When LENGTH one run alone:

LENGTH sum: 48268, per Iteration: 193.072

So looks like the LENGTH one is the winner here:

@Override
public long getResult() throws Exception {
    File me = new File(FileSizeBench.class.getResource(
            "FileSizeBench.class").getFile());
    return me.length();
}
查看更多
春风洒进眼中
6楼-- · 2019-01-02 20:27

If you want the file size of multiple files in a directory, use Files.walkFileTree. You can obtain the size from the BasicFileAttributes that you'll receive.

This is much faster then calling .length() on the result of File.listFiles() or using Files.size() on the result of Files.newDirectoryStream(). In my test cases it was about 100 times faster.

查看更多
查无此人
7楼-- · 2019-01-02 20:28

In response to rgrig's benchmark, the time taken to open/close the FileChannel & RandomAccessFile instances also needs to be taken into account, as these classes will open a stream for reading the file.

After modifying the benchmark, I got these results for 1 iterations on a 85MB file:

file totalTime: 48000 (48 us)
raf totalTime: 261000 (261 us)
channel totalTime: 7020000 (7 ms)

For 10000 iterations on same file:

file totalTime: 80074000 (80 ms)
raf totalTime: 295417000 (295 ms)
channel totalTime: 368239000 (368 ms)

If all you need is the file size, file.length() is the fastest way to do it. If you plan to use the file for other purposes like reading/writing, then RAF seems to be a better bet. Just don't forget to close the file connection :-)

import java.io.File;
import java.io.FileInputStream;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import java.util.HashMap;
import java.util.Map;

public class FileSizeBench
{    
    public static void main(String[] args) throws Exception
    {
        int iterations = 1;
        String fileEntry = args[0];

        Map<String, Long> times = new HashMap<String, Long>();
        times.put("file", 0L);
        times.put("channel", 0L);
        times.put("raf", 0L);

        long fileSize;
        long start;
        long end;
        File f1;
        FileChannel channel;
        RandomAccessFile raf;

        for (int i = 0; i < iterations; i++)
        {
            // file.length()
            start = System.nanoTime();
            f1 = new File(fileEntry);
            fileSize = f1.length();
            end = System.nanoTime();
            times.put("file", times.get("file") + end - start);

            // channel.size()
            start = System.nanoTime();
            channel = new FileInputStream(fileEntry).getChannel();
            fileSize = channel.size();
            channel.close();
            end = System.nanoTime();
            times.put("channel", times.get("channel") + end - start);

            // raf.length()
            start = System.nanoTime();
            raf = new RandomAccessFile(fileEntry, "r");
            fileSize = raf.length();
            raf.close();
            end = System.nanoTime();
            times.put("raf", times.get("raf") + end - start);
        }

        for (Map.Entry<String, Long> entry : times.entrySet()) {
            System.out.println(entry.getKey() + " totalTime: " + entry.getValue() + " (" + getTime(entry.getValue()) + ")");
        }
    }

    public static String getTime(Long timeTaken)
    {
        if (timeTaken < 1000) {
            return timeTaken + " ns";
        } else if (timeTaken < (1000*1000)) {
            return timeTaken/1000 + " us"; 
        } else {
            return timeTaken/(1000*1000) + " ms";
        } 
    }
}
查看更多
登录 后发表回答