Closing mapped streams - what's the idea?

2020-06-12 04:43发布

问题:

It's well known that Javadoc says about Stream interface:

Streams have a BaseStream.close() method and implement AutoCloseable, but nearly all stream instances do not actually need to be closed after use. Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing. Most streams are backed by collections, arrays, or generating functions, which require no special resource management. (If a stream does require closing, it can be declared as a resource in a try-with-resources statement.)

Ok, but there are methods like flatMapToInt in this interface at the same time:

IntStream flatMapToInt(Function<? super T, ? extends IntStream> mapper);

for which Javadoc specification says:

Each mapped stream is closed after its contents have been placed into this stream.

So, I didn't got the idea: if IntStream isn't designed to have IO channel in his source, why is it closed inside this method?

For example, ReferencePipeline implementation does it in this way:

try (IntStream result = mapper.apply(u)) {     
   if (result != null)
       result.sequential().forEach(downstreamAsInt);
}

More general question could be: should we care about closing streams like IntStream (or its descendants) or not? If not, then why does flatMapTo* care?

EDIT @Tunaki has provided very interesting email link. But this all is about flatMap, where I agree we should close stream in general case. But my question is about special cases: flatMapToInt, flatMapToLong and so on, where I don't see any necessity of closing streams.

EDIT-2 @BrianGoetz is appealed here, because it is his cited email, therefore he is in the subject :)

回答1:

The general rule about resource handling is that whoever is responsible for closing a resource is the one that opened it. The flatMap operation is the only operation in the Stream API that opens a Stream, so it is the only operation that will close it.

Quoting from this mail, Brian Goetz said:

To summarize, flatMap() is the only operation that internally closes the stream after its done, and for good reason -- it is the only case where the stream is effectively opened by the operation itself, and therefore should be closed by the operation too. Any other streams are assumed to be opened by the caller, and therefore should be closed by the caller.

The example given is the following. Consider

try (Stream<Path> paths = Files.walk(dir)) {
    Stream<String> stream = paths.flatMap(p ->  {
        try {
            return Files.lines(p);
        } catch (IOException e) {
            throw new UncheckedIOException(e);
        }
    });
}

The method reference Files::lines returns a Stream<String> of the lines of the file. When the flat mapping operation is over, it is expected that the opened resource used to read the file is closed. The question is: closed by what? Well, closed by flatMap itself because it is the operation that opened the Stream in the first place.

Files.lines returns a Stream with a pre-registered close handler that closes the underlying BufferedReader. When the flatMap operation is done, this close handler is invoked and the resources are correctly released.


The reason this idea is backported to flatMapTo* operations is the same: adhering to the above rule that every resource allocated by a process should be closed by that process.

Just to show that you can build an IntStream which would have an underlying resource to close, consider the following Stream pipeline where each path is not flatmapped to its lines but to the number of character in each line.

try (Stream<Path> paths = Files.walk(dir)) {
    IntStream stream = paths.flatMapToInt(p ->  {
        try {
            return Files.lines(p).mapToInt(String::length);
        } catch (IOException e) {
            throw new UncheckedIOException(e);
        }
    });
}