Is flatMap guaranteed to be lazy?

2019-01-14 19:45发布

问题:

Consider the following code:

urls.stream()
    .flatMap(url -> fetchDataFromInternet(url).stream())
    .filter(...)
    .findFirst()
    .get();

Will fetchDataFromInternet be called for second url when the first one was enough?

I tried with a smaller example and it looks like working as expected. i.e processes data one by one but can this behavior be relied on? If not, does calling .sequential() before .flatMap(...) help?

    Stream.of("one", "two", "three")
            .flatMap(num -> {
                System.out.println("Processing " + num);
                // return FetchFromInternetForNum(num).data().stream();
                return Stream.of(num);
            })
            .peek(num -> System.out.println("Peek before filter: "+ num))
            .filter(num -> num.length() > 0)
            .peek(num -> System.out.println("Peek after filter: "+ num))
            .forEach(num -> {
                System.out.println("Done " + num);
            });

Output:

Processing one
Peek before filter: one
Peek after filter: one
Done one
Processing two
Peek before filter: two
Peek after filter: two
Done two
Processing three
Peek before filter: three
Peek after filter: three
Done three

Update: Using official Oracle JDK8 if that matters on implementation

Answer: Based on the comments and the answers below, flatmap is partially lazy. i.e reads the first stream fully and only when required, it goes for next. Reading a stream is eager but reading multiple streams is lazy.

If this behavior is intended, the API should let the function return an Iterable instead of a stream.

In other words: link

回答1:

Under the current implementation, flatmap is eager; like any other stateful intermediate operation (like sorted and distinct). And it's very easy to prove :

 int result = Stream.of(1)
            .flatMap(x -> Stream.generate(() -> ThreadLocalRandom.current().nextInt()))
            .findFirst()
            .get();

    System.out.println(result);

This never finishes as flatMap is computed eagerly. For your example:

urls.stream()
    .flatMap(url -> fetchDataFromInternet(url).stream())
    .filter(...)
    .findFirst()
    .get();

It means that for each url, the flatMap will block all others operation that come after it, even if you care about a single one. So let's suppose that from a single url your fetchDataFromInternet(url) generates 10_000 lines, well your findFirst will have to wait for all 10_000 to be computed, even if you care about only one.

EDIT

This is fixed in Java 10, where we get our laziness back: see JDK-8075939



回答2:

It’s not clear why you set up an example that does not address the actual question, you’re interested in. If you want to know, whether the processing is lazy when applying a short-circuiting operation like findFirst(), well, then use an example using findFirst() instead of forEach that processes all elements anyway. Also, put the logging statement right into the function whose evaluation you want to track:

Stream.of("hello", "world")
      .flatMap(s -> {
          System.out.println("flatMap function evaluated for \""+s+'"');
          return s.chars().boxed();
      })
      .peek(c -> System.out.printf("processing element %c%n", c))
      .filter(c -> c>'h')
      .findFirst()
      .ifPresent(c -> System.out.printf("found an %c%n", c));
flatMap function evaluated for "hello"
processing element h
processing element e
processing element l
processing element l
processing element o
found an l

This demonstrates that the function passed to flatMap gets evaluated lazily as expected while the elements of the returned (sub-)stream are not evaluated as lazy as possible, as already discussed in the Q&A you have linked yourself.

So, regarding your fetchDataFromInternet method that gets invoked from the function passed to flatMap, you will get the desired laziness. But not for the data it returns.