My exact scenario is inserting data to database in batches, so I want to accumulate DOM objects then every 1000, flush them.
I implemented it by putting code in the accumulator to detect fullness then flush, but that seems wrong - the flush control should come from the caller.
I could convert the stream to a List then use subList in an iterative fashion, but that too seems clunky.
It there a neat way to take action every n elements then continue with the stream while only processing the stream once?
Most of answers above do not use stream benefits like saving your memory. You can try to use iterator to resolve the problem
If you have guava dependency on your project you could do this:
See https://google.github.io/guava/releases/23.0/api/docs/com/google/common/collect/Lists.html#partition-java.util.List-int-
Look's like no, cause creating chunks means reducing stream, and reduce means termination. If you need to maintain stream nature and process chunks without collecting all data before here is my code (does not work for parallel streams):
and how to use
it will print
the idea behind is to create lists in a map operation with 'pattern'
and merge (+process) that with reduce.
and don't forget to process the last 'trimmed' chunk with
Using library StreamEx solution would look like
Output:
groupRuns
accepts predicate that decides whether 2 elements should be in the same group.It produces a group as soon as it finds first element that does not belong to it.
Elegance is in the eye of the beholder. If you don't mind using a stateful function in
groupingBy
, you can do this:This doesn't win any performance or memory usage points over your original solution because it will still materialize the entire stream before doing anything.
If you want to avoid materializing the list, stream API will not help you. You will have to get the stream's iterator or spliterator and do something like this:
As Misha rightfully said, Elegance is in the eye of the beholder. I personally think an elegant solution would be to let the class that inserts to the database do this task. Similar to a
BufferedWriter
. This way it does not depend on your original data structure and can be used even with multiple streams after one and another. I am not sure if this is exactly what you mean by having the code in the accumulator which you thought is wrong. I don't think it is wrong, since the existing classes likeBufferedWriter
work this way. You have some flush control from the caller this way by callingflush()
on the writer at any point.Something like the following code.
Now your stream gets processed like this:
If you want to work multithreaded, you could run the flush asynchronous. The taking from the stream can't go in parallel but I don't think there is a way to count 1000 elements from a stream in parallel anyway.
You can also extend the writer to allow setting of the buffer size in constructor or you can make it implement
AutoCloseable
and run it in a try with ressources and more. The nice things you have from aBufferedWriter
.