Group sequences of values

2019-01-12 00:54发布

问题:

I'm wondering if there's in any nifty way to use the new Stream APIs to "group" sequences of values.

e.g. split a series of integers, into groups of integers where each group is an ascending number sequence:

IntStream seq = IntStream.of(1, 2, 3, -1, -1, 1, 2, 1, 2);
IntFunction next = i -> i + 1;

// DESIRED OUTPUT: [[1,2,3], [-1], [-1], [1,2], [1,2]]

回答1:

Unfortunately, the Stream API is not very well suited to tackle problems that involve dependant operations on the Stream element, like this one.

However, you can use the StreamEx library for this:

public static void main(String[] args) {
    IntStream seq = IntStream.of(1, 2, 3, -1, -1, 1, 2, 1, 2);
    IntUnaryOperator next = i -> i + 1;

    List<List<Integer>> result = 
        IntStreamEx.of(seq).boxed().groupRuns((i1, i2) -> next.applyAsInt(i1) == i2).toList();

    System.out.println(result); // prints "[[1, 2, 3], [-1], [-1], [1, 2], [1, 2]]"
}

This groups into a List all consecutive integers where the second one is equal to the next function applied to the first one. Finally, this Stream is collected into a List.



回答2:

If you're willing to operate on an in-memory data structure, such as an array or list, it's possible to do this in standard Java 8 in just a couple steps. This can be done using array programming techniques such as illustrated in my answer to this question. Using some clever conditionals similar to that used in Flown's answer to this question takes care of the edge cases in a neat way.

The key insight is to realize that a new segment (or group) begins at every point where the desired predicate is not met. That is, a new segment begins is where seq[i-1] + 1 != seq[i]. Let's run an IntStream over the input and filter the indexes for this property and store the result in some array x:

    int[] seq = { 1, 2, 3, -1, -1, 1, 2, 1, 2 };
    int[] x = IntStream.range(1, seq.length)
                       .filter(i -> seq[i-1] + 1 != seq[i])
                       .toArray();

resulting in

    [3, 4, 5, 7]

This only gives us the interior boundaries of the segments. To get the starts and ends of the segments, we need to tack on the start of the first segment and the end of the last segment. We adjust the index range and add some conditionals to the filter:

    int[] x = IntStream.rangeClosed(0, seq.length)
                       .filter(i -> i == 0 || i == seq.length ||
                                    seq[i-1] + 1 != seq[i])
                       .toArray();

    [0, 3, 4, 5, 7, 9]

Now every adjacent pair of indexes is a subrange of the original array. We can use another stream to extract those subranges, giving the desired result:

    int[][] result =
        IntStream.range(0, x.length - 1)
                 .mapToObj(i -> Arrays.copyOfRange(seq, x[i], x[i+1]))
                 .toArray(int[][]::new);

    [[1, 2, 3], [-1], [-1], [1, 2], [1, 2]]

This can be extracted into a function that itself takes a "next" function that computes the next value in the segment. That is, for any element, if the element to its right matches the result of the next-function, the elements are in the same segment; otherwise it's a segment boundary. Here's the code:

int[][] segments(int[] seq, IntUnaryOperator next) {
    int[] x = IntStream.rangeClosed(0, seq.length)
                       .filter(i -> i == 0 || i == seq.length ||
                               next.applyAsInt(seq[i-1]) != seq[i])
                       .toArray();

    return  IntStream.range(0, x.length - 1)
                     .mapToObj(i -> Arrays.copyOfRange(seq, x[i], x[i+1]))
                     .toArray(int[][]::new);
}

You'd call it like this:

    int[] seq = { 1, 2, 3, -1, -1, 1, 2, 1, 2 };
    System.out.println(Arrays.deepToString(segments(seq, i -> i + 1)));

    [[1, 2, 3], [-1], [-1], [1, 2], [1, 2]]

Changing the next-function allows splitting the segments in a different way. For example, to split an array into segments of equal values, you'd do this:

    int[] seq = { 2, 2, 1, 3, 3, 1, 1, 1, 4, 4, 4 };
    System.out.println(Arrays.deepToString(segments(seq, i -> i)));

    [[2, 2], [1], [3, 3], [1, 1, 1], [4, 4, 4]]

The difficulty with using a next-function like this is that the condition for values belonging to a segment is limited. It would be nicer provide a predicate that compares to adjacent values to test if they're in the same segment. We can do that using a BiPredicate<Integer, Integer> if we're willing to pay the cost of boxing:

int[][] segments(int[] input, BiPredicate<Integer, Integer> pred) {
    int[] x = IntStream.rangeClosed(0, input.length)
                       .filter(i -> i == 0 || i == input.length ||
                               !pred.test(input[i-1], input[i]))
                       .toArray();

    return  IntStream.range(0, x.length - 1)
                     .mapToObj(i -> Arrays.copyOfRange(input, x[i], x[i+1]))
                     .toArray(int[][]::new);
}

This allows gathering segments using a different criterion, for example, monotonically increasing segments:

    int[] seq = { 3, 1, 4, 1, 5, 9, 2, 6, 5, 3 };
    System.out.println(Arrays.deepToString(segments(seq, (a, b) -> b > a)));

    [[3], [1, 4], [1, 5, 9], [2, 6], [5], [3]]

This could be specialized to use a primitive bi-predicate over two int values, or it could be generalized to allow using a BiPredicate of any type over input of any type.



回答3:

Not so elegant as @Tunaki solution, but using "pure" Java-8 streams:

IntStream seq = IntStream.of(1, 2, 3, -1, -1, 1, 2, 1, 2);

Deque<Deque<Integer>> r = new ArrayDeque<>(singleton(new ArrayDeque<>()));

seq.filter(i -> !r.getLast().isEmpty() && r.getLast().getLast() + 1 != i || !r.getLast().add(i))
            .forEach(i -> r.add(new ArrayDeque<>(singleton(i))));

System.out.println(r); // prints: [[1, 2, 3], [-1], [-1], [1, 2], [1, 2]]

Here just for elegancy of code I use Deque class in order to use getLast() method (for List it will be not so compact).