Grouping a range of integers to the answer of a fu

2019-04-14 13:33发布

问题:

For a range of integers, I would like to apply an ("expensive") operation, filter out only those integers with interesting answers, then group on the answer.

This first snippet works, but it duplicates the operation ("modulus 2") both in code and computation:

IntStream.range(1, 10).boxed()
         .filter(p -> (p % 2 != 0))                     // Expensive computation
         .collect(Collectors.groupingBy(p -> (p % 2))); // Same expensive
                                                        // computation!

// {1=[1, 3, 5, 7, 9]} (Correct answer)

I tried mapping to the answer first, then filter, then group - but the initial Integer of course gets lost along the way:

IntStream.range(1, 10).boxed()
         .map(p -> p % 2)                        // Expensive computation
         .filter(p -> p != 0)
         .collect(Collectors.groupingBy(p -> p));

// {1=[1, 1, 1, 1, 1]} (Of course, wrong answer)

I would like to map to a tuple or something like that, but haven't figured out a clean way to do it.

回答1:

Well, since I described what I'd do, and there's been a bit of discussion about it, I figured I should write down what I was describing:

    class IntPair {
        final int input, result;
        IntPair(int i, int r) { input = i; result = r; }
    }

    Map<Integer, List<Integer>> output =
        IntStream.range(1, 10)
            .mapToObj(i -> new IntPair(i, i % 2))
            .filter(pair -> pair.result != 0)
            .collect(groupingBy(pair -> pair.result,
                mapping(pair -> pair.input, toList())));

Note that the helper class can (and probably should) be a nested class of some sort, or even a local class.

One thing that's nice about having names for the fields is that it does make it easier to understand what's going on. When I initially wrote this up, I had inadvertently reversed the roles of input and result in the grouping operation and thus I got the incorrect result. After rereading the code it was quite easy for me to see that I was grouping by the input values instead result values, and it was also very easy to fix. This would be harder to diagnose and fix if I had to use arr[0] and arr[1] or tuple.t1 and tuple.t2.



回答2:

Why not group on the result of the expensive computation and then filter the resulting map?

IntStream.range(1, 10).boxed()
        .collect(groupingBy(x -> x % 2))
        .entrySet().stream()
        .filter(e -> e.getKey() != 0)
        .collect(toMap(Map.Entry::getKey, Map.Entry::getValue));


回答3:

If you want to implement this via single stream (without collecting to the intermediate map), you can do it like this:

IntStream.range(1, 10).boxed()
    .map(p -> new AbstractMap.SimpleEntry<>(p % 2, p))
    .filter(entry -> entry.getKey() != 0)
    .collect(Collectors.groupingBy(Entry::getKey,
             Collectors.mapping(Entry::getValue, Collectors.toList())));

If you don't mind using third-party code, my StreamEx library has syntactic sugar especially for such tasks:

IntStreamEx.range(1, 10).boxed()
     // map to Map.Entry where keys are your expensive computation
     // and values are input elements. The result is EntryStream
     // which implements the Stream<Map.Entry> and has additional methods
    .mapToEntry(p -> p % 2, Function.identity())
    .filterKeys(k -> k != 0)
    .grouping();

Internally it's pretty much the same as the first solution.



回答4:

The general solution is to remember the result of the computation. E.g. like suggested by Stuart Marks but if you don’t want to introduce a new type you may use an int array instead:

Map<Integer, List<Integer>> map = IntStream.range(1, 10)
    .mapToObj(i -> new int[]{i, i % 2})
    .filter(pair -> pair[1] != 0)
    .collect(groupingBy(pair -> pair[1],
        mapping(pair -> pair[0], toList())));

The specific solution to this special case is to replace the expensive operation i%2 by the simple operation i&1:

Map<Integer, List<Integer>> map = IntStream.range(1, 10)
    .filter(i-> (i&1)!=0)
    .boxed().collect(groupingBy(i->i&1));

That operation is so cheap that I wouldn’t care about it’s repetition. But if you do, of course

Map<Integer, List<Integer>> map = IntStream.range(1, 10)
    .filter(i-> (i&1)!=0)
    .boxed().collect(groupingBy(i->1));

or

Map<Integer, List<Integer>> map = Collections.singletonMap(1, 
    IntStream.range(1, 10).filter(i-> (i&1)!=0)
             .boxed().collect(toList()));

will solve the problem. It is, of course, not a re-usable solution but lambda expressions are one-use code fragments anyway.



回答5:

One way is to collect the numbers and answers to Map<Integer, Integer> first, then operate on the stream of entries:

IntStream.range(1, 10).boxed()
         .collect(toMap(p -> p, p -> p % 2)) 
         .entrySet().stream()
         .filter(p -> p.getValue() != 0)
         .collect(groupingBy(p -> p.getValue(),
                  mapping(p -> p.getKey(),
                  toList())));

Not sure about the performance implications if this though.



回答6:

So here is my solution :). Maybe not the best.

import java.util.stream.IntStream;
import java.util.stream.Collectors;
import java.util.Map;
import java.util.List;

public class Test {

    private static final class Tuple {
        private final Integer t1;
        private final Integer t2;

        private Tuple(final Integer t1, final Integer t2) {
            this.t1 = t1;
            this.t2 = t2;
        }

        public Integer getT1() { return t1; }
        public Integer getT2() { return t2; }
    }

    private static String getValues(final List<Tuple> list) {
        final StringBuilder stringBuilder = new StringBuilder();
        for(Tuple t : list) {
            stringBuilder.append(t.getT2()).append(", ");
        }
        return stringBuilder.toString();
    }

     public static void main(String []args){

        Map<Integer, List<Tuple>> results =
        IntStream.range(1, 10).boxed()
         .map(p -> new Tuple(p % 2, p))                     // Expensive computation
         .filter(p -> p.getT1() != 0)
         .collect(Collectors.groupingBy(p -> p.getT1())); 

        results.forEach((k, v) -> System.out.println(k + "=" + getValues(v)));
     }

}

Output is 1=1, 3, 5, 7, 9;

About performance:

sh-4.3# java -Xmx128M -Xms16M Test                                                                                                                                                                     
Time a1: 231                                                                                                                                                                                                  
Time a2: 125                                                                                                                                                                                                  
sh-4.3# java -Xmx128M -Xms16M Test
Time a1: 211                                                                                                                                                                                                  
Time a2: 127                                                                                                                                                                                                  
sh-4.3# java -Xmx128M -Xms16M Test
Time a1: 172                                                                                                                                                                                                  
Time a2: 100 

A1 is your first algorithm in the question and A2 is mine in this answer. So it is faster even with a helper class.

Here is performance measurements also including the algorithm in your answer (as A3):

sh-4.3# java -Xmx128M -Xms16M HelloWorld                                                                                                                                                                      
Time a1: 202                                                                                                                                                                                                  
Time a2: 113                                                                                                                                                                                                  
Time a2: 170                                                                                                                                                                                                  
sh-4.3# java -Xmx128M -Xms16M HelloWorld                                                                                                                                                                      
Time a1: 195                                                                                                                                                                                                  
Time a2: 114                                                                                                                                                                                                  
Time a2: 169 

I find it weird that yours is outperformed by mine, thought it would be more or less the same.