Efficiently compute Intersection of two Sets in Ja

2019-01-04 23:54发布

What is the most efficient way to find the size of the intersection of two non-sparse Sets in Java? This is an operation I will be calling on large sets a very large number of times, so optimisation is important. I cannot modify the original sets.

I have looked at Apache Commons CollectionUtils.intersection which appears to be quite slow. My current approach is to take the smaller of the two sets, clone it, and then call .retainAll on the larger of the two sets.

public static int getIntersection(Set<Long> set1, Set<Long> set2) {
    boolean set1IsLarger = set1.size() > set2.size();
    Set<Long> cloneSet = new HashSet<Long>(set1IsLarger ? set2 : set1);
    cloneSet.retainAll(set1IsLarger ? set1 : set2);
    return cloneSet.size();
}

9条回答
来,给爷笑一个
2楼-- · 2019-01-05 00:26

That is a good approach. You should be getting O(n) performance from your current solution.

查看更多
时光不老,我们不散
3楼-- · 2019-01-05 00:29

If you are computing the intersection just for the sake of counting how many elements are there in the set, I suggest you just need to count the intersection directly instead of building the set and then calling size().

My function for counting:

/**
 * Computes the size of intersection of two sets
 * @param small first set. preferably smaller than the second argument
 * @param large second set;
 * @param <T> the type
 * @return size of intersection of sets
 */
public <T> int countIntersection(Set<T> small, Set<T> large){
    //assuming first argument to be smaller than the later;
    //however double checking to be sure
    if (small.size() > large.size()) {
        //swap the references;
        Set<T> tmp = small;
        small = large;
        large = tmp;
    }
    int result = 0;
    for (T item : small) {
        if (large.contains(item)){
            //item found in both the sets
            result++;
        }
    }
    return result;
}
查看更多
做自己的国王
4楼-- · 2019-01-05 00:35

FYI, if any collection of sets are all sorted using the same comparision relation, then you can iterate their intersection in time N * M, where N is the size of the smallest set and M is the number of sets.

Implementation left as exercise to reader. Here's an example.

查看更多
登录 后发表回答