What is the most efficient way to find the size of the intersection of two non-sparse Sets in Java? This is an operation I will be calling on large sets a very large number of times, so optimisation is important. I cannot modify the original sets.
I have looked at Apache Commons CollectionUtils.intersection which appears to be quite slow. My current approach is to take the smaller of the two sets, clone it, and then call .retainAll on the larger of the two sets.
public static int getIntersection(Set<Long> set1, Set<Long> set2) {
boolean set1IsLarger = set1.size() > set2.size();
Set<Long> cloneSet = new HashSet<Long>(set1IsLarger ? set2 : set1);
cloneSet.retainAll(set1IsLarger ? set1 : set2);
return cloneSet.size();
}
That is a good approach. You should be getting O(n) performance from your current solution.
If you are computing the intersection just for the sake of counting how many elements are there in the set, I suggest you just need to count the intersection directly instead of building the set and then calling
size()
.My function for counting:
FYI, if any collection of sets are all sorted using the same comparision relation, then you can iterate their intersection in time N * M, where N is the size of the smallest set and M is the number of sets.
Implementation left as exercise to reader. Here's an example.