Collecting stream back into the same collection ty

2019-04-09 02:45发布

问题:

Suppose I have a collection of the unknown type. What I want to do is stream it, do some stuff on the stream, and collect it back into the same collection type as my original collection. For instance:

Collection<? extends Integer> getBigger(Collection<? extends Integer> col, int value) {
    return col.stream().filter(v -> v > value).collect(????);
} 

The idea of this incomplete code example is to return a List if col is of List class (or any subclass of it), a Set if col is of Set class, etc... The method name and actual operations on the stream here are not important, I've specified them just to illustrate my question. So, is it possible?

回答1:

It is not possible without violating the principle on which the Java streams framework has been built on. It would completely violate the idea of abstracting the stream from its physical representation.

The sequence of bulk data operations goes in a pipeline, see the following picture:

The stream is somehow similar to the Schrödinger's cat - it is not materialized until you call the terminal operation. The stream handling is completely abstract and detached from the original stream source.

If you want to work so low-level with your original data storage, don't feel ashamed simply avoiding the streams. They are just a tool, not anything sacred. By introducing streams, the Good Old Collections are still as good as they were, with added value of the internal iteration - the new Iterable.forEach() method.


Added to satisfy your curiosity :)

A possible solution follows. I don't like it myself, and I have not been able to solve all the generics issues there, but it works with limitations.

The idea is creating a collector returning the same type as the input collection. However, not all the collections provide a nullary constructor (with no parameters), and without it the Class.newInstance() method does not work. There is also the problem of the awkwardness of checked exceptions within lambda expression. (It is mentioned in this nice answer here: https://stackoverflow.com/a/22919112/2886891)

public Collection<Integer> getBiggerThan(Collection<Integer> col, int value) {
    // Collection below is an example of one of the rare appropriate 
    // uses of raw types. getClass returns the runtime type of col, and 
    // at runtime all type parameters have been erased.
    @SuppressWarnings("rawtypes")
    final Class<? extends Collection> clazz = col.getClass();
    System.out.println("Input collection type: " + clazz);
    final Supplier<Collection<Integer>> supplier = () -> {
        try {
            return clazz.newInstance();
        }
        catch (InstantiationException | IllegalAccessException e) {
            throw new RuntimeException(
                    "A checked exception caught inside lambda", e);
        }
    };
    // After all the ugly preparatory code, enjoy the clean pipeline:
    return col.stream()
            .filter(v -> v > value)
            .collect(supplier, Collection::add, Collection::addAll);
}

As you can see, it works in general, supposed your original collection provides a nullary constructor.

public void test() {
    final Collection<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

    final Collection<Integer> arrayList = new ArrayList<>(numbers);
    final Collection<Integer> arrayList2 = getBiggerThan(arrayList, 6);
    System.out.println(arrayList2);
    System.out.println(arrayList2.getClass());
    System.out.println();

    final Collection<Integer> set = new HashSet<>(arrayList);
    final Collection<Integer> set2 = getBiggerThan(set, 6);
    System.out.println(set2);
    System.out.println(set2.getClass());
    System.out.println();

    // This does not work as Arrays.asList() is of a type
    // java.util.Arrays$ArrayList which does not provide a nullary constructor
    final Collection<Integer> numbers2 = getBiggerThan(numbers, 6);
}


回答2:

There are two issues here: (1) the runtime type (class) of the input and its result, and (2) the compile-time type of the input and its result.

For (1), it may seem strange, but in general it's not possible in Java to create a copy of an instance of an arbitrary class. Using getClass().newInstance() might not work if the class doesn't have an accessible no-arg constructor or if it's immutable. The object might not be Cloneable either. Thus, the caller needs to pass in a supplier that's responsible for creating an instance of the right result class.

For (2), a suitable dose of generics can make this type-safe at compile time.

<T extends Comparable<T>, C extends Collection<T>> C getBigger(
        C col, T value, Supplier<C> supplier) {
    return col.stream()
              .filter(v -> v.compareTo(value) > 0)
              .collect(Collectors.toCollection(supplier::get));
}

Note that there is a bound of Comparable<T> on the type parameter T so that the caller is restricted to passing a collection of things that are comparable. This lets us use compareTo to compare the values. We also use the Collectors.toCollection method and pass the supplier's get method through to it.

Examples of use:

List<Integer> input1 = Arrays.asList(1, 4, 9, 13, 14, 22);
List<Integer> filtered1 = getBigger(input1, 10, ArrayList::new);

Set<String> input2 = new HashSet<>();
input2.add("foo");
input2.add("bar");
input2.add("baz");
input2.add("qux");
Set<String> filtered2 = getBigger(input2, "c", HashSet::new);


回答3:

Since actual underlying type is known only be the callee of your method, it should be their responsibility to collect it to Collection of whatever type they want (e.g. using Collectors.toCollection(CustomCollectionType::new);). So your method should return Stream. It can take Collection or Stream depending on convenience.