I have the following function for the unification of multiple collections (includes repeated elements):
public static <T> List<T> unify(Collection<T>... collections) {
return Arrays.stream(collections)
.flatMap(Collection::stream)
.collect(Collectors.toList());
}
It would be nice to have a function with a similar signature for the intersection of collections (using type equality). For example:
public static <T> List<T> intersect(Collection<T>... collections) {
//Here is where the magic happens
}
I found an implementation of the intersect function, but it doesnt use streams:
public static <T> Set<T> intersect(Collection<? extends Collection<T>> collections) {
Set<T> common = new LinkedHashSet<T>();
if (!collections.isEmpty()) {
Iterator<? extends Collection<T>> iterator = collections.iterator();
common.addAll(iterator.next());
while (iterator.hasNext()) {
common.retainAll(iterator.next());
}
}
return common;
}
Is there any way to implement something similar to the unify function making use of streams? Im not so experienced in java8/stream api, because of that some advice would be really helpful.
I think maybe it would make more sense to use Set instead of List (maybe that was a typo in your question):
You can write your own collector in some utility class and use it:
The usage would be pretty simple:
Note however that collector cannot short-circuit: even if intermediate result will be an empty collection, it will still process the rest of the stream.
Such collector is readily available in my free StreamEx library (see
MoreCollectors.intersecting()
). It works with normal streams like above, but if you use it with StreamEx (which extends normal stream) it becomes short-circuiting: the processing may actually stop early.While it’s tempting to think of
retainAll
as a black-box bulk operation that must be the most efficient way to implement an intersection operation, it just implies iterating over the entire collection and testing for each element whether it is contained in the collection passed as argument. The fact that you are calling it on aSet
does not imply any advantage, as it is the other collection, whosecontains
method will determine the overall performance.This implies that linearly scanning a set and testing each element for containment within all other collections will be on par with performing
retainAll
for each collection. Bonus points for iterating over the smallest collection in the first place:or, alternatively
and here's a Set implementation. retainAll() is a Collection method, so it works on all of them.
And with List<> if order is important.
Java Collections lets them look almost identical. If required, you could filter the List to be distinct as it may contain duplicates.
You can write it with streams as follows:
The 3-argument
collect
replicates yourretainAll
logicThe streams implementation gives you the flexibility to tweak the logic more easily. For example, if all your collections are sets, you might want to start with the smallest set instead of the first one (for performance). To do that, you would replace
findFirst()
withmin(comparing(Collection::size))
and get rid of theskip(1)
. Or you could see if you get better performance with the type of data you work with by running the second stream in parallel and all you would need to do is changestream
toparallelStream
.