Java8 Stream of files, how to control the closing

2020-03-03 09:03发布

问题:

Suppose I have a Java8 Stream<FileReader> and that I use that stream to map and such, how can I control the closing of the FileReaders used in the stream?

Note, that I may not have access to the individual FileReaders, for example:

filenames.map(File::new)
    .filter(File::exists)
    .map(f->{
        BufferedReader br = null;
        try {
            br = new BufferedReader(new FileReader(f));
        } catch(Exception e) {}
            return Optional.ofNullable(br);
        })
    .filter(Optional::isPresent)
    .map(Optional::get)
    .flatMap(...something that reads the file contents...) // From here, the Stream doesn't content something that gives access to the FileReaders

After doing some other mappings, etc, I finally lose the FileReaders in the sequel.

I first thought the garbage collector is able to do it when needed, but I've experienced OS descriptor exhaustion when filenames is a long Stream.

回答1:

A general note on the use of FileReader: FileReader uses internally a FileInputStream which overrides finalize() and is therefore discouraged to use beacause of the impact it has on garbarge collection especially when dealing with lots of files.

Unless you're using a Java version prior to Java 7 you should use the java.nio.files API instead, creating a BufferedReader with

 Path path = Paths.get(filename);
 BufferedReader br = Files.newBufferedReader(path);

So the beginning of your stream pipeline should look more like

 filenames.map(Paths::get)
          .filter(Files::exists)
          .map(p -> {
        try {
            return Optional.of(Files.newBufferedReader(p));
        } catch (IOException e) {
            return Optional.empty();
        }
    }) 

Now to your problem:

Option 1

One way to preserve the original Reader would be to use a Tuple. A tuple (or any n-ary variation of it) is generally a good way to handle multiple results of a function application, as it's done in a stream pipeline:

class ReaderTuple<T> {
   final Reader first;
   final T second;
   ReaderTuple(Reader r, T s){
     first = r;
     second = s;
   }
}

Now you can map the FileReader to a Tuple with the second item being your current stream item:

 filenames.map(Paths::get)
  .filter(Files::exists)
  .map(p -> {
        try {
            return Optional.of(Files.newBufferedReader(p));
        } catch (IOException e) {
            return Optional.empty();
        }
    }) 
  .filter(Optional::isPresent)
  .map(Optional::get)
  .flatMap(r -> new ReaderTuple(r, yourOtherItem))
  ....
  .peek(rt -> {
    try { 
      rt.first.close()  //close the reader or use a try-with-resources
    } catch(Exception e){}
   })
  ... 

Problem with that approach is, that whenever an unchecked exception occurrs during stream execution betweem the flatMap and the peek, the readers might not be closed.

Option 2

An alternative to use a tuple is to put the code that requires the reader in a try-with-resources block. This approach has the advantage that you're in control to close all readers.

Example 1:

 filenames.map(Paths::get)
  .filter(Files::exists)
  .map(p -> {
        try (Reader r = new BufferedReader(new FileReader(p))){

            Stream.of(r)
            .... //put here your stream code that uses the stream

        } catch (IOException e) {
            return Optional.empty();
        }
    }) //reader is implicitly closed here
 .... //terminal operation here

Example 2:

filenames.map(Paths::get)
  .filter(Files::exists)
  .map(p -> {
        try {
            return Optional.of(Files.newBufferedReader(p));
        } catch (IOException e) {
            return Optional.empty();
        }
    }) 
 .filter(Optional::isPresent)
 .map(Optional::get)
 .flatMap(reader -> {
   try(Reader r = reader) {

      //read from your reader here and return the items to flatten

   } //reader is implicitly closed here
  }) 

Example 1 has the advantage that the reader gets certainly closed. Example 2 is safe unless you put something more between the the creation of the reader and the try-with-resources block that may fail.

I personally would go for Example 1, and put the code that is accessing the reader in a separate function so the code is better readable.



回答2:

Perhaps a better solution is to use a Consumer<FileReader> to consume each element in the stream.

Another problem you might be running into if there are a lot of files is the files will all be open at the same time. It might be better to close each one as soon as it's done.

Let's say you change the code above into a method that takes a Consumer<BufferedReader>

I probably wouldn't use a stream for this but we can use one anyway to show how one would use it.

public void readAllFiles( Consumer<BufferedReader> consumer){
    Objects.requireNonNull(consumer);

    filenames.map(File::new)
             .filter(File::exists)
             .forEach(f->{

                 try(BufferedReader br = new BufferedReader(new FileReader(f))){
                     consumer.accept(br);
                 } catch(Exception e) {
                     //handle exception
                 }
             });
}

This way we make sure we close each reader and can still support doing whatever the user wants.

For example this would still work

 readAllFiles( br -> System.out.println( br.lines().count()));


回答3:

So, if you only have non-binary files you could use something like this:

List<String> fileNames = Arrays.asList(
            "C:\\Users\\wowse\\hallo.txt",
            "C:\\Users\\wowse\\bye.txt");

fileNames.stream()
            .map(Paths::get)
            .filter(Files::exists)
            .flatMap(path -> {
                try {
                    return Files.lines(path);
                } catch (Exception e) {
                    e.printStackTrace();
                }
                return null;
            })
            .forEach(System.out::println);

If you have binary files which you can hold in memory you can try following approach.

fileNames.stream()
            .map(Paths::get)
            .filter(Files::exists)
            .map(path -> {
                try {
                    return Files.readAllBytes(path);
                } catch (Exception e) {
                    e.printStackTrace();
                }
                return null;
            })
            .filter(Objects::nonNull)
            .map(String::new)
            .forEach(System.out::println);

Other than that I think you would have to use some wrapper class, where I could suggest Map.Entry or the Pair from javafx so you don't have to use any external libraries.



回答4:

I know this is an old question, but there is a very nice solution that I found here and I don't think it's on SO.

The java.nio.file offers methods that let you do this cleanly:

filenames.map(Paths::get)
    // Nicer alternative to File::exists
    .filter(Files::exists)
    // This will automatically close the stream after each file is done reading
    .flatMap(path -> {
        try {
            return Files.lines(path);
        } catch (IOException e) {
            // Seamlessly handles error opening file, no need for filtering
            return Stream.empty();
        }
    })
    .map(/* Do something with each line... */)


回答5:

Just for the sake of the argument ( though I agree with Louis above ) : You can pass around the original Reader/InputStream ( or any object, but the case you provided is actually flawed programming, because you can pass the FileReader instead of encapsulating it with BufferedReader) using common-lang3 Pair Class. Jool is also a valid library that provides Tuple* classes.

Example :

filenames.map(File::new)
    .filter(File::exists)
    .map(f->{
        BufferedReader br = null;
        FileReader fr = null;
        try {
            fr = new FileReader(f)
            br = new BufferedReader(fr);
            return Optional.of(Pair.of(br,fr)) ;
        } catch(Exception e) {}
            return Optional.ofNullable(br);
        })
    .filter(Optional::isPresent)
    .map(Optional::get)
    .flatMap( pair -> { 
        try {
            // do something with br               
        } finally {
             try {
                 pair.getRight().close();
             } catch (IOException x ){
                 throw new RuntimeException(x) ;
             }
        }
    })