Multiple aggregate functions in Java 8 Stream API

2019-03-27 03:09发布

问题:

I have a class defined like

public class TimePeriodCalc {
    private double occupancy;
    private double efficiency;
    private String atDate;
}

I would like to perform the following SQL statement using Java 8 Stream API.

SELECT atDate, AVG(occupancy), AVG(efficiency)
FROM TimePeriodCalc
GROUP BY atDate

I tried :

Collection<TimePeriodCalc> collector = result.stream().collect(groupingBy(p -> p.getAtDate(), ....

What can be put into the code to select multiple attributes ? I'm thinking of using multiple Collectors but really don't know how to do so.

回答1:

To do it without a custom Collector (not streaming again on the result), you could do it like this. It's a bit dirty, since it is first collecting to Map<String, List<TimePeriodCalc>> and then streaming that list and get the average double.

Since you need two averages, they are collected to a Holder or a Pair, in this case I'm using AbstractMap.SimpleEntry

  Map<String, SimpleEntry<Double, Double>> map = Stream.of(new TimePeriodCalc(12d, 10d, "A"), new TimePeriodCalc(2d, 16d, "A"))
            .collect(Collectors.groupingBy(TimePeriodCalc::getAtDate,
                    Collectors.collectingAndThen(Collectors.toList(), list -> {
                        double occupancy = list.stream().collect(
                                Collectors.averagingDouble(TimePeriodCalc::getOccupancy));
                        double efficiency = list.stream().collect(
                                Collectors.averagingDouble(TimePeriodCalc::getEfficiency));
                        return new AbstractMap.SimpleEntry<>(occupancy, efficiency);
                    })));

    System.out.println(map);


回答2:

Assuming that your TimePeriodCalc class has all the necessary getters, this should get you the list you want:

List<TimePeriodCalc> result = new ArrayList<>(
    list.stream()
    .collect(Collectors.groupingBy(TimePeriodCalc::getAtDate, 
        Collectors.collectingAndThen(Collectors.toList(), TimePeriodCalc::avgTimePeriodCalc)))
    .values()
);

Where TimePeriodCalc.avgTimePeriodCalc is this method in the TimePeriodCalc class:

public static TimePeriodCalc avgTimePeriodCalc(List<TimePeriodCalc> list){
    return new TimePeriodCalc(
            list.stream().collect(Collectors.averagingDouble(TimePeriodCalc::getOccupancy)),
            list.stream().collect(Collectors.averagingDouble(TimePeriodCalc::getEfficiency)),
            list.get(0).getAtDate()
            );
}

The above can be combined into this monstrosity:

List<TimePeriodCalc> result = new ArrayList<>(
    list.stream()
    .collect(Collectors.groupingBy(TimePeriodCalc::getAtDate, 
        Collectors.collectingAndThen(
            Collectors.toList(), a -> {
                return new TimePeriodCalc(
                        a.stream().collect(Collectors.averagingDouble(TimePeriodCalc::getOccupancy)),
                        a.stream().collect(Collectors.averagingDouble(TimePeriodCalc::getEfficiency)),
                        a.get(0).getAtDate()
                        );
            }
        )))
    .values());

With input:

List<TimePeriodCalc> list = new ArrayList<>();
list.add(new TimePeriodCalc(10,10,"a"));
list.add(new TimePeriodCalc(10,10,"b"));
list.add(new TimePeriodCalc(10,10,"c"));
list.add(new TimePeriodCalc(5,5,"a"));
list.add(new TimePeriodCalc(0,0,"b"));

This would give:

TimePeriodCalc [occupancy=7.5, efficiency=7.5, atDate=a]
TimePeriodCalc [occupancy=5.0, efficiency=5.0, atDate=b]
TimePeriodCalc [occupancy=10.0, efficiency=10.0, atDate=c]


回答3:

Here's a way with a custom collector. It only needs one pass, but it's not very easy, especially because of generics...

If you have this method:

@SuppressWarnings("unchecked")
@SafeVarargs
static <T, A, C extends Collector<T, A, Double>> Collector<T, ?, List<Double>>
averagingManyDoubles(ToDoubleFunction<? super T>... extractors) {

    List<C> collectors = Arrays.stream(extractors)
        .map(extractor -> (C) Collectors.averagingDouble(extractor))
        .collect(Collectors.toList());

    class Acc {
        List<A> averages = collectors.stream()
            .map(c -> c.supplier().get())
            .collect(Collectors.toList());

        void add(T elem) {
            IntStream.range(0, extractors.length).forEach(i ->
                collectors.get(i).accumulator().accept(averages.get(i), elem));
        }

        Acc merge(Acc another) {
            IntStream.range(0, extractors.length).forEach(i ->
                averages.set(i, collectors.get(i).combiner()
                    .apply(averages.get(i), another.averages.get(i))));
            return this;
        }

        List<Double> finish() {
            return IntStream.range(0, extractors.length)
                .mapToObj(i -> collectors.get(i).finisher().apply(averages.get(i)))
                .collect(Collectors.toList());
        }
    }
    return Collector.of(Acc::new, Acc::add, Acc::merge, Acc::finish);
}

This receives an array of functions that will extract double values from each element of the stream. These extractors are converted to Collectors.averagingDouble collectors and then the local Acc class is created with the mutable structures that are used to accumulate the averages for each collector. Then, the accumulator function forwards to each accumulator, and so with the combiner and finisher functions.

Usage is as follows:

Map<String, List<Double>> averages = list.stream()
    .collect(Collectors.groupingBy(
        TimePeriodCalc::getAtDate,
        averagingManyDoubles(
            TimePeriodCalc::getOccupancy,
            TimePeriodCalc::getEfficiency)));


回答4:

You can chain multiple attributes like this:

Collection<TimePeriodCalc> collector = result.stream().collect(Collectors.groupingBy(p -> p.getAtDate(), Collectors.averagingInt(p -> p.getOccupancy())));

If you want more, you get the idea.