List all files from a directory recursively with J

2019-01-03 08:34发布

I have this function that prints the name of all the files in a directory recursively. The problem is that my code is very slow because it has to access a remote network device with every iteration.

My plan is to first load all the files from the directory recursively and then after that go through all files with the regex to filter out all the files I don't want. Does anyone have a better suggestion?

public static printFnames(String sDir){
  File[] faFiles = new File(sDir).listFiles();
  for(File file: faFiles){
    if(file.getName().matches("^(.*?)")){
      System.out.println(file.getAbsolutePath());
    }
    if(file.isDirectory()){
      printFnames(file.getAbsolutePath());
    }
  }
}

This is just a test later on I'm not going to use the code like this, instead I'm going to add the path and modification date of every file which matches an advanced regex to an array.

标签: java file-io
16条回答
ら.Afraid
2楼-- · 2019-01-03 09:15

Assuming this is actual production code you'll be writing, then I suggest using the solution to this sort of thing that's already been solved - Apache Commons IO, specifically FileUtils.listFiles(). It handles nested directories, filters (based on name, modification time, etc).

For example, for your regex:

Collection files = FileUtils.listFiles(
  dir, 
  new RegexFileFilter("^(.*?)"), 
  DirectoryFileFilter.DIRECTORY
);

This will recursively search for files matching the ^(.*?) regex, returning the results as a collection.

It's worth noting that this will be no faster than rolling your own code, it's doing the same thing - trawling a filesystem in Java is just slow. The difference is, the Apache Commons version will have no bugs in it.

查看更多
时光不老,我们不散
3楼-- · 2019-01-03 09:16

it feels like it's stupid access the filesystem and get the contents for every subdirectory instead of getting everything at once.

Your feeling is wrong. That's how filesystems work. There is no faster way (except when you have to do this repeatedly or for different patterns, you can cache all the file paths in memory, but then you have to deal with cache invalidation i.e. what happens when files are added/removed/renamed while the app runs).

查看更多
Bombasti
4楼-- · 2019-01-03 09:20

This will work fine

public void displayAll(File path){      
    if(path.isFile()){
        System.out.println(path.getName());
    }else{
        System.out.println(path.getName());         
        File files[] = path.listFiles();
        for(File dirOrFile: files){
            displayAll(dirOrFile);
        }
    }
}

查看更多
乱世女痞
5楼-- · 2019-01-03 09:22

Java's interface for reading filesystem folder contents is not very performant (as you've discovered). JDK 7 fixes this with a completely new interface for this sort of thing, which should bring native level performance to these sorts of operations.

The core issue is that Java makes a native system call for every single file. On a low latency interface, this is not that big of a deal - but on a network with even moderate latency, it really adds up. If you profile your algorithm above, you'll find that the bulk of the time is spent in the pesky isDirectory() call - that's because you are incurring a round trip for every single call to isDirectory(). Most modern OSes can provide this sort of information when the list of files/folders was originally requested (as opposed to querying each individual file path for it's properties).

If you can't wait for JDK7, one strategy for addressing this latency is to go multi-threaded and use an ExecutorService with a maximum # of threads to perform your recursion. It's not great (you have to deal with locking of your output data structures), but it'll be a heck of a lot faster than doing this single threaded.

In all of your discussions about this sort of thing, I highly recommend that you compare against the best you could do using native code (or even a command line script that does roughly the same thing). Saying that it takes an hour to traverse a network structure doesn't really mean that much. Telling us that you can do it native in 7 second, but it takes an hour in Java will get people's attention.

查看更多
看我几分像从前
6楼-- · 2019-01-03 09:26

With Java 7 a faster way to walk thru a directory tree was introduced with the Paths and Files functionality. They're much faster then the "old" File way.

This would be the code to walk thru and check path names with a regular expression:

public final void test() throws IOException, InterruptedException {
    final Path rootDir = Paths.get("path to your directory where the walk starts");

    // Walk thru mainDir directory
    Files.walkFileTree(rootDir, new FileVisitor<Path>() {
        // First (minor) speed up. Compile regular expression pattern only one time.
        private Pattern pattern = Pattern.compile("^(.*?)");

        @Override
        public FileVisitResult preVisitDirectory(Path path,
                BasicFileAttributes atts) throws IOException {

            boolean matches = pattern.matcher(path.toString()).matches();

            // TODO: Put here your business logic when matches equals true/false

            return (matches)? FileVisitResult.CONTINUE:FileVisitResult.SKIP_SUBTREE;
        }

        @Override
        public FileVisitResult visitFile(Path path, BasicFileAttributes mainAtts)
                throws IOException {

            boolean matches = pattern.matcher(path.toString()).matches();

            // TODO: Put here your business logic when matches equals true/false

            return FileVisitResult.CONTINUE;
        }

        @Override
        public FileVisitResult postVisitDirectory(Path path,
                IOException exc) throws IOException {
            // TODO Auto-generated method stub
            return FileVisitResult.CONTINUE;
        }

        @Override
        public FileVisitResult visitFileFailed(Path path, IOException exc)
                throws IOException {
            exc.printStackTrace();

            // If the root directory has failed it makes no sense to continue
            return path.equals(rootDir)? FileVisitResult.TERMINATE:FileVisitResult.CONTINUE;
        }
    });
}
查看更多
Lonely孤独者°
7楼-- · 2019-01-03 09:26

I personally like this version of FileUtils. Here's an example that finds all mp3s or flacs in a directory or any of its subdirectories:

String[] types = {"mp3", "flac"};
Collection<File> files2 = FileUtils.listFiles(/path/to/your/dir, types , true);
查看更多
登录 后发表回答