java get file size efficiently

2楼-- · 2019-01-02 20:22

The benchmark given by GHad measures lots of other stuff (such as reflection, instantiating objects, etc.) besides getting the length. If we try to get rid of these things then for one call I get the following times in microseconds:

   file sum___19.0, per Iteration___19.0
    raf sum___16.0, per Iteration___16.0
channel sum__273.0, per Iteration__273.0

For 100 runs and 10000 iterations I get:

   file sum__1767629.0, per Iteration__1.7676290000000001
    raf sum___881284.0, per Iteration__0.8812840000000001
channel sum___414286.0, per Iteration__0.414286

I did run the following modified code giving as an argument the name of a 100MB file.

import java.io.*;
import java.nio.channels.*;
import java.net.*;
import java.util.*;

public class FileSizeBench {

  private static File file;
  private static FileChannel channel;
  private static RandomAccessFile raf;

  public static void main(String[] args) throws Exception {
    int runs = 1;
    int iterations = 1;

    file = new File(args[0]);
    channel = new FileInputStream(args[0]).getChannel();
    raf = new RandomAccessFile(args[0], "r");

    HashMap<String, Double> times = new HashMap<String, Double>();
    times.put("file", 0.0);
    times.put("channel", 0.0);
    times.put("raf", 0.0);

    long start;
    for (int i = 0; i < runs; ++i) {
      long l = file.length();

      start = System.nanoTime();
      for (int j = 0; j < iterations; ++j)
        if (l != file.length()) throw new Exception();
      times.put("file", times.get("file") + System.nanoTime() - start);

      start = System.nanoTime();
      for (int j = 0; j < iterations; ++j)
        if (l != channel.size()) throw new Exception();
      times.put("channel", times.get("channel") + System.nanoTime() - start);

      start = System.nanoTime();
      for (int j = 0; j < iterations; ++j)
        if (l != raf.length()) throw new Exception();
      times.put("raf", times.get("raf") + System.nanoTime() - start);
    }
    for (Map.Entry<String, Double> entry : times.entrySet()) {
        System.out.println(
            entry.getKey() + " sum: " + 1e-3 * entry.getValue() +
            ", per Iteration: " + (1e-3 * entry.getValue() / runs / iterations));
    }
  }
}

0人赞添加讨论(0) 举报

弹指情弦暗扣

3楼-- · 2019-01-02 20:24

I ran into this same issue. I needed to get the file size and modified date of 90,000 files on a network share. Using Java, and being as minimalistic as possible, it would take a very long time. (I needed to get the URL from the file, and the path of the object as well. So its varied somewhat, but more than an hour.) I then used a native Win32 executable, and did the same task, just dumping the file path, modified, and size to the console, and executed that from Java. The speed was amazing. The native process, and my string handling to read the data could process over 1000 items a second.

So even though people down ranked the above comment, this is a valid solution, and did solve my issue. In my case I knew the folders I needed the sizes of ahead of time, and I could pass that in the command line to my win32 app. I went from hours to process a directory to minutes.

The issue did also seem to be Windows specific. OS X did not have the same issue and could access network file info as fast as the OS could do so.

Java File handling on Windows is terrible. Local disk access for files is fine though. It was just network shares that caused the terrible performance. Windows could get info on the network share and calculate the total size in under a minute too.

--Ben

0人赞添加讨论(0) 举报

余生请多指教

4楼-- · 2019-01-02 20:25

When I modify your code to use a file accessed by an absolute path instead of a resource, I get a different result (for 1 run, 1 iteration, and a 100,000 byte file -- times for a 10 byte file are identical to 100,000 bytes)

LENGTH sum: 33, per Iteration: 33.0

CHANNEL sum: 3626, per Iteration: 3626.0

URL sum: 294, per Iteration: 294.0

0人赞添加讨论(0) 举报

妖精总统

5楼-- · 2019-01-02 20:25

From GHad's benchmark, there are a few issue people have mentioned:

1>Like BalusC mentioned: stream.available() is flowed in this case.

Because available() returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream.

So 1st to remove the URL this approach.

2>As StuartH mentioned - the order the test run also make the cache difference, so take that out by run the test separately.

Now start test:

When CHANNEL one run alone:

CHANNEL sum: 59691, per Iteration: 238.764

When LENGTH one run alone:

LENGTH sum: 48268, per Iteration: 193.072

So looks like the LENGTH one is the winner here:

@Override
public long getResult() throws Exception {
    File me = new File(FileSizeBench.class.getResource(
            "FileSizeBench.class").getFile());
    return me.length();
}

0人赞添加讨论(0) 举报

春风洒进眼中

6楼-- · 2019-01-02 20:27

If you want the file size of multiple files in a directory, use Files.walkFileTree. You can obtain the size from the BasicFileAttributes that you'll receive.

This is much faster then calling .length() on the result of File.listFiles() or using Files.size() on the result of Files.newDirectoryStream(). In my test cases it was about 100 times faster.

0人赞添加讨论(0) 举报

查无此人

7楼-- · 2019-01-02 20:28

In response to rgrig's benchmark, the time taken to open/close the FileChannel & RandomAccessFile instances also needs to be taken into account, as these classes will open a stream for reading the file.

After modifying the benchmark, I got these results for 1 iterations on a 85MB file:

file totalTime: 48000 (48 us)
raf totalTime: 261000 (261 us)
channel totalTime: 7020000 (7 ms)

For 10000 iterations on same file:

file totalTime: 80074000 (80 ms)
raf totalTime: 295417000 (295 ms)
channel totalTime: 368239000 (368 ms)

If all you need is the file size, file.length() is the fastest way to do it. If you plan to use the file for other purposes like reading/writing, then RAF seems to be a better bet. Just don't forget to close the file connection :-)

import java.io.File;
import java.io.FileInputStream;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import java.util.HashMap;
import java.util.Map;

public class FileSizeBench
{    
    public static void main(String[] args) throws Exception
    {
        int iterations = 1;
        String fileEntry = args[0];

        Map<String, Long> times = new HashMap<String, Long>();
        times.put("file", 0L);
        times.put("channel", 0L);
        times.put("raf", 0L);

        long fileSize;
        long start;
        long end;
        File f1;
        FileChannel channel;
        RandomAccessFile raf;

        for (int i = 0; i < iterations; i++)
        {
            // file.length()
            start = System.nanoTime();
            f1 = new File(fileEntry);
            fileSize = f1.length();
            end = System.nanoTime();
            times.put("file", times.get("file") + end - start);

            // channel.size()
            start = System.nanoTime();
            channel = new FileInputStream(fileEntry).getChannel();
            fileSize = channel.size();
            channel.close();
            end = System.nanoTime();
            times.put("channel", times.get("channel") + end - start);

            // raf.length()
            start = System.nanoTime();
            raf = new RandomAccessFile(fileEntry, "r");
            fileSize = raf.length();
            raf.close();
            end = System.nanoTime();
            times.put("raf", times.get("raf") + end - start);
        }

        for (Map.Entry<String, Long> entry : times.entrySet()) {
            System.out.println(entry.getKey() + " totalTime: " + entry.getValue() + " (" + getTime(entry.getValue()) + ")");
        }
    }

    public static String getTime(Long timeTaken)
    {
        if (timeTaken < 1000) {
            return timeTaken + " ns";
        } else if (timeTaken < (1000*1000)) {
            return timeTaken/1000 + " us"; 
        } else {
            return timeTaken/(1000*1000) + " ms";
        } 
    }
}

0人赞添加讨论(0) 举报

java get file size efficiently

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间