When should I use IntStream.range in Java?

2020-05-21 07:37发布

问题:

I would like to know when I can use IntStream.range effectively. I have three reasons why I am not sure how useful IntStream.range is.

(Please think of start and end as integers.)

  1. If I want an array, [start, start+1, ..., end-2, end-1], the code below is much faster.

    int[] arr = new int[end - start];
    int index = 0;
    for(int i = start; i < end; i++)
        arr[index++] = i;
    

    This is probably because toArray() in IntStream.range(start, end).toArray() is very slow.

  2. I use MersenneTwister to shuffle arrays. (I downloaded MersenneTwister class online.) I do not think there is a way to shuffle IntStream using MersenneTwister.

  3. I do not think just getting int numbers from start to end-1 is useful. I can use for(int i = start; i < end; i++), which seems easier and not slow.

Could you tell me when I should choose IntStream.range?

回答1:

There are several uses for IntStream.range.

One is to use the int values themselves:

IntStream.range(start, end).filter(i -> isPrime(i))....

Another is to do something N times:

IntStream.range(0, N).forEach(this::doSomething);

Your case (1) is to create an array filled with a range:

int[] arr = IntStream.range(start, end).toArray();

You say this is "very slow" but, like other respondents, I suspect your benchmark methodology. For small arrays there is indeed more overhead with stream setup, but this should be so small as to be unnoticeable. For large arrays the overhead should be negligible, as filling a large array is dominated by memory bandwidth.

Sometimes you need to fill an existing array. You can do that this way:

int[] arr = new int[end - start];
IntStream.range(0, end - start).forEach(i -> arr[i] = i + start);

There's a utility method Arrays.setAll that can do this even more concisely:

int[] arr = new int[end - start];
Arrays.setAll(arr, i -> i + start);

There is also Arrays.parallelSetAll which can fill an existing array in parallel. Internally, it simply uses an IntStream and calls parallel() on it. This should provide a speedup for large array on a multicore system.

I've found that a fair number of my answers on Stack Overflow involve using IntStream.range. You can search for them using these search criteria in the search box:

user:1441122 IntStream.range

One application of IntStream.range I find particularly useful is to operate on elements of an array, where the array indexes as well as the array's values participate in the computation. There's a whole class of problems like this.

For example, suppose you want to find the locations of increasing runs of numbers within an array. The result is an array of indexes into the first array, where each index points to the start of a run.

To compute this, observe that a run starts at a location where the value is less than the previous value. (A run also starts at location 0). Thus:

    int[] arr = { 1, 3, 5, 7, 9, 2, 4, 6, 3, 5, 0 };
    int[] runs = IntStream.range(0, arr.length)
                          .filter(i -> i == 0 || arr[i-1] > arr[i])
                          .toArray();
    System.out.println(Arrays.toString(runs));

    [0, 5, 8, 10]

Of course, you could do this with a for-loop, but I find that using IntStream is preferable in many cases. For example, it's easy to store an unknown number of results into an array using toArray(), whereas with a for-loop you have to handle copying and resizing, which distracts from the core logic of the loop.

Finally, it's much easier to run IntStream.range computations in parallel.



回答2:

Here's an example:

public class Test {

    public static void main(String[] args) {
        System.out.println(sum(LongStream.of(40,2))); // call A
        System.out.println(sum(LongStream.range(1,100_000_000))); //call B
    }

    public static long sum(LongStream in) {
        return in.sum();
    }

}

So, let's look at what sum() does: it counts the sum of an arbitrary stream of numbers. We call it in two different ways: once with an explicit list of numbers, and once with a range.

If you only had call A, you might be tempted to put the two numbers into an array and pass it to sum() but that's clearly not an option with call B (you'd run out of memory). Likewise you could just pass the start and end for call B, but then you couldn't support the case of call A.

So to sum it up, ranges are useful here because:

  • We need to pass them around between methods
  • The target method doesn't just work on ranges but any stream of numbers
  • But it only operates on individual numbers of the stream, reading them sequentially. (This is why shuffling with streams is a terrible idea in general.)

There is also the readability argument: code using streams can be much more concise than loops, and thus more readable, but I wanted to show an example where a solution relying on IntStreans is functionally superior too.

I used LongStream to emphasise the point, but the same goes for IntStream

And yes, for simple summing this may look like a bit of an overkill, but consider for example reservoir sampling



回答3:

IntStream.range returns a range of integers as a stream so you can do stream processing over it.

like taking square of each element

IntStream.range(1, 10).map(i -> i * i);  


回答4:

Basically, if you want Stream operations, you can use the range() method. For example, to use concurrency or want to use map() or reduce(). Then you are better off with IntStream.

For example:

IntStream.range(1, 5).parallel().forEach(i -> heavyOperation());

Or:

IntStream.range(1, 5).reduce(1, (x, y) -> x * y)  
// > 24

You can achieve the second example also with a for-loop, but you need intermediate variables etc.

Also, if you want the first match for example, you can use findFirst() and cousins to stop consuming the rest of the Stream



回答5:

It totally depends on the use case. However, the syntax and stream API adds lot of easy one liners which can definitely replace the conventional loops.

IntStream is really helpful and syntactic sugar in some cases,

IntStream.range(1, 101).sum();
IntStream.range(1, 101).average();
IntStream.range(1, 101).filter(i -> i % 2 == 0).count();
//... and so on

Whatever you can do with IntStream you can do with conventional loops. As one liner is more precise to understand and maintain.

Still for negative loops we can not use IntStream#range, it only works in positive increment. So following is not possible,

for(int i = 100; i > 1; i--) {
    // Negative loop
}
  • Case 1 : Yes conventional loop is much faster in this case as toArray has a bit overhead.

  • Case 2 : I don't know anything about it, my apologies.

  • Case 3 : IntStream is not slow at all, IntStream.range and conventional loop are almost same in terms of performance.

See :

  • Java 8 nested loops with streams & performance


回答6:

Here are few differences that comes to my head between IntStream.range and traditional for loops :

  • IntStream are lazily evaluated, the pipeline is traversed when calling a terminal operation. For loops evaluate at each iteration.
  • IntStream will provides you some functions that are commonly applied to a range of ints such as sum and avg.
  • IntStream will allow you to code multiple operation over a range of int in a functional way which read more fluently - specially if you have a lot of operations.

So basically use IntStream when one or more of these differences are useful to you.

But please bear in mind that shuffling a Stream sound quite strange as a Stream is not a data structure and therefore it does not really make sense to shuffle it (in case you were planning on building a special IntSupplier). Shuffle the result instead.

As for the performance, while there may be a few overhead, you will still iterate N times in both case and should not really care more.



回答7:

You could implement your Mersenne Twister as an Iterator and stream from that.