Suppose I want to groupBy
on a iterator, compiler asks to "value groupBy is not a member of Iterator[Int]
". One way would be to convert iterator to list which I want to avoid. I want to do the groupBy
such that the input is Iterator[A]
and output is Map[B, Iterator[A]]
. Such that the part of the iterator is loaded only when that part of element is accessed and not loading the whole list into memory. I also know the possible set of keys, so I can say whether a particular key exists.
def groupBy(iter: Iterator[A], f: fun(A)->B): Map[B, Iterator[A]] = {
.........
}
One possibility is, you can convert Iterator to view and then groupBy as,
iter.toTraversable.view.groupBy(_.whatever)
I don't think this is doable without storing results in memory (and in this case switching to a list would be much easier). Iterator
implies that you can make only one pass over the whole collection.
For instance let's say you have a sequence 1 2 3 4 5 6
and you want to groupBy
odd an even numbers:
groupBy(it, v => v % 2 == 0)
Then you could query the result with either true
and false
to get an iterator. The problem should you loop one of those two iterators till the end you couldn't do the same thing for the other one (as you cannot reset an iterator in Scala).
This would be doable should the elements were sorted according to the same rule you're using in groupBy
.
As said in other responses, the only way to achieve a lazy groupBy on Iterator is to internally buffer elements. The worst case for the memory will be in O(n)
. If you know in advance that the keys are well distributed in your iterator, the buffer can be a viable solution.
The solution is relatively complex, but a good start are some methods from the Iterator
trait in the Scala source code:
- The
partition
method that uses both the buffered
method to keep the head value in memory, and two internal queues (lookahead) for each of the produced iterators.
- The
span
method with also the buffered
method and this time a unique queue for the leading iterator.
- The
duplicate
method. Perhaps less interesting, but we can again observe another use of a queue to store the gap between the two produced iterators.
In the groupBy case, we will have a variable number of produced iterators instead of two in the above examples. If requested, I can try to write this method.
Note that you have to know the list of keys in advance. Otherwise, you will need to traverse (and buffer) the entire iterator to collect the different keys to build your Map.