可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm writing a library for novice programmers so I'm trying to keep the API as clean as possible.
One of the things my Library needs to do is perform some complex computations on a large collection of ints or longs. There are lots of scenarios and business objects that my users need to compute these values from, so I thought the best way would be to use streams to allow users to map business objects to IntStream
or LongStream
and then compute the computations inside of a collector.
However IntStream and LongStream only have the 3 parameter collect method:
collect(Supplier<R> supplier, ObjIntConsumer<R> accumulator, BiConsumer<R,R> combiner)
And doesn't have the simplier collect(Collector)
method that Stream<T>
has.
So instead of being able to do
Collection<T> businessObjs = ...
MyResult result = businessObjs.stream()
.mapToInt( ... )
.collect( new MyComplexComputation(...));
I have to do provide Suppliers, accumulators and combiners like this:
MyResult result = businessObjs.stream()
.mapToInt( ... )
.collect(
()-> new MyComplexComputationBuilder(...),
(builder, v)-> builder.add(v),
(a,b)-> a.merge(b))
.build(); //prev collect returns Builder object
This is way too complicated for my novice users and is very error prone.
My work around is to make static methods that take an IntStream
or LongStream
as input and hide the collector creation and execution for you
public static MyResult compute(IntStream stream, ...){
return .collect(
()-> new MyComplexComputationBuilder(...),
(builder, v)-> builder.add(v),
(a,b)-> a.merge(b))
.build();
}
But that doesn't follow the normal conventions of working with Streams:
IntStream tmpStream = businessObjs.stream()
.mapToInt( ... );
MyResult result = MyUtil.compute(tmpStream, ...);
Because you have to either save a temp variable and pass that to the static method, or create the Stream inside the static call which may be confusing when it's is mixed in with the other parameters to my computation.
Is there a cleaner way to do this while still working with IntStream
or LongStream
?
回答1:
We did in fact prototype some Collector.OfXxx
specializations. What we found -- in addition to the obvious annoyance of more specialized types -- was that this was not really very useful without having a full complement of primitive-specialized collections (like Trove does, or GS-Collections, but which the JDK does not have). Without an IntArrayList, for example, a Collector.OfInt merely pushes the boxing somewhere else -- from the Collector to the container -- which no big win, and lots more API surface.
回答2:
Perhaps if method references are used instead of lambdas, the code needed for the primitive stream collect will not seem as complicated.
MyResult result = businessObjs.stream()
.mapToInt( ... )
.collect(
MyComplexComputationBuilder::new,
MyComplexComputationBuilder::add,
MyComplexComputationBuilder::merge)
.build(); //prev collect returns Builder object
In Brian's definitive answer to this question, he mentions two other Java collection frameworks that do have primitive collections that actually can be used with the collect method on primitive streams. I thought it might be useful to illustrate some examples of how to use the primitive containers in these frameworks with primitive streams. The code below will also work with a parallel stream.
// Eclipse Collections
List<Integer> integers = Interval.oneTo(5).toList();
Assert.assertEquals(
IntInterval.oneTo(5),
integers.stream()
.mapToInt(Integer::intValue)
.collect(IntArrayList::new, IntArrayList::add, IntArrayList::addAll));
// Trove Collections
Assert.assertEquals(
new TIntArrayList(IntStream.range(1, 6).toArray()),
integers.stream()
.mapToInt(Integer::intValue)
.collect(TIntArrayList::new, TIntArrayList::add, TIntArrayList::addAll));
Note: I am a committer for Eclipse Collections.
回答3:
Convert the primitive streams to boxed object streams if there are methods you're missing.
MyResult result = businessObjs.stream()
.mapToInt( ... )
.boxed()
.collect( new MyComplexComputation(...));
Or don't use the primitive streams in the first place and work with Integer
s the whole time.
MyResult result = businessObjs.stream()
.map( ... ) // map to Integer not int
.collect( new MyComplexComputation(...));
回答4:
I've implemented the primitive collectors in my library StreamEx (since version 0.3.0). There are interfaces IntCollector
, LongCollector
and DoubleCollector
which extend the Collector
interface and specialized to work with primitives. There's an additional minor difference in combining procedure as methods like IntStream.collect
accept a BiConsumer
instead of BinaryOperator
.
There is a bunch of predefined collection methods to join numbers to string, store to primitive array, to BitSet
, find min, max, sum, calculate summary statistics, perform group-by and partition-by operations. Of course, you can define your own collectors. Here's several usage examples (assumed that you have int[] input
array with input data).
Join numbers as string with separator:
String nums = IntStreamEx.of(input).collect(IntCollector.joining(","));
Grouping by last digit:
Map<Integer, int[]> groups = IntStreamEx.of(input)
.collect(IntCollector.groupingBy(i -> i % 10));
Sum positive and negative numbers separately:
Map<Boolean, Integer> sums = IntStreamEx.of(input)
.collect(IntCollector.partitioningBy(i -> i > 0, IntCollector.summing()));
Here's a simple benchmark which compares these collectors and usual object collectors.
Note that my library does not provide (and will not provide in future) any user-visible data structures like maps on primitives, so grouping is performed into usual HashMap
. However if you are using Trove/GS/HFTC/whatever, it's not so difficult to write additional primitive collectors for the data structures defined in these libraries to gain more performance.
回答5:
Mr. Geotz provided the definitive answer for why the decision was made not to include specialized Collectors, however, I wanted to further investigate how much this decision affected performance.
I thought I would post my results as an answer.
I used the jmh microbenchmark framework to time how long it takes to compute calculations using both kinds of Collectors over collections of sizes 1, 100, 1000, 100,000 and 1 million:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class MyBenchmark {
@Param({"1", "100", "1000", "100000", "1000000"})
public int size;
List<BusinessObj> seqs;
@Setup
public void setup(){
seqs = new ArrayList<BusinessObj>(size);
Random rand = new Random();
for(int i=0; i< size; i++){
//these lengths are random but over 128 so no caching of Longs
seqs.add(BusinessObjFactory.createOfRandomLength());
}
}
@Benchmark
public double objectCollector() {
return seqs.stream()
.map(BusinessObj::getLength)
.collect(MyUtil.myCalcLongCollector())
.getAsDouble();
}
@Benchmark
public double primitiveCollector() {
LongStream stream= seqs.stream()
.mapToLong(BusinessObj::getLength);
return MyUtil.myCalc(stream)
.getAsDouble();
}
public static void main(String[] args) throws RunnerException{
Options opt = new OptionsBuilder()
.include(MyBenchmark.class.getSimpleName())
.build();
new Runner(opt).run();
}
}
Here are the results:
# JMH 1.9.3 (released 4 days ago)
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_31.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.objectCollector
# Run complete. Total time: 01:30:31
Benchmark (size) Mode Cnt Score Error Units
MyBenchmark.objectCollector 1 avgt 200 140.803 ± 1.425 ns/op
MyBenchmark.objectCollector 100 avgt 200 5775.294 ± 67.871 ns/op
MyBenchmark.objectCollector 1000 avgt 200 70440.488 ± 1023.177 ns/op
MyBenchmark.objectCollector 100000 avgt 200 10292595.233 ± 101036.563 ns/op
MyBenchmark.objectCollector 1000000 avgt 200 100147057.376 ± 979662.707 ns/op
MyBenchmark.primitiveCollector 1 avgt 200 140.971 ± 1.382 ns/op
MyBenchmark.primitiveCollector 100 avgt 200 4654.527 ± 87.101 ns/op
MyBenchmark.primitiveCollector 1000 avgt 200 60929.398 ± 1127.517 ns/op
MyBenchmark.primitiveCollector 100000 avgt 200 9784655.013 ± 113339.448 ns/op
MyBenchmark.primitiveCollector 1000000 avgt 200 94822089.334 ± 1031475.051 ns/op
As you can see, the primitive Stream version is slightly faster, but even when there are 1 million elements in the collection, it is only 0.05 seconds faster (on average).
For my API I would rather keep to the cleaner Object Stream conventions and use the Boxed version since it is such a minor performance penalty.
Thanks to everyone who shed insight into this issue.