Stream#filter Runs out of Memory for 1,000,000 ite

2019-06-25 14:30发布

Let's say I have a Stream of length 1,000,000 with all 1's.

scala> val million = Stream.fill(100000000)(1)
million: scala.collection.immutable.Stream[Int] = Stream(1, ?)

scala> million filter (x => x % 2 == 0)
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

I get an Out of Memory exception.

Then, I tried the same filter call with List.

scala> val y = List.fill(1000000)(1)
y: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ...

scala> y.filter(x => x % 2 == 0)
res2: List[Int] = List()

Yet it succeeds.

Why does the Stream#filter run out of memory here, but the List#filter completes just fine?

Lastly, with a large stream, will filter result in the non-lazy evaluation of the entire stream?

1条回答
女痞
2楼-- · 2019-06-25 14:48

Overhead of List - single object (instance of ::) with 2 fields (2 pointers) per element.

Overhead of Stream - instance of Cons (with 3 pointers) plus an instance of Function (tl: => Stream[A]) for lazy evaluation of Stream#tail per element.

So you'll spend ~2 times more memory on Stream.

You have defined your Stream as val. Alternatively you could define million as def - in this case after filter GC will delete all created elements and you'll get your memory back.

Note that only tail in Stream is lazy, head is strict, so filter evaluates strictly until it gets first element that satisfies a given predicate, and since there is no such elements in your Stream filter iterates over all your million stream and puts all elements in memory.

查看更多
登录 后发表回答