The fastest way to find union of sets

I have sets of pairs of int like set<pair<int,int> > x1, x2, ... xn ( n can be between 2 and 20). What is the fastest way to find union of those sets ?

Sorry If I wasn't make clear at the beginning, I meant fast in performance, memory allocation is not a problem.

标签： c++ algorithm stl stl-algorithm

7条回答

Evening l夕情丶

2楼-- · 2019-02-16 09:26

Try the set_union in the header algorithm.

0人赞添加讨论(0) 举报

时光不老，我们不散

3楼-- · 2019-02-16 09:29

To save on memory allocations and improve locality, it'd be better to use a single vector<T> as working memory.

Construct a vector<T> and reserve the total number of elements in all of the s (counting duplicates). Then, starting with the empty range [v.begin(), v.begin()), extend it to a set-like (unique, sorted) range by appending the contents of each set, merging and uniquifying:

vector<T> v;
v.reserve(<total size>);
for (set<T> &s: sets) {
    auto middle = v.insert(v.end(), s.begin(), s.end());
    inplace_merge(v.begin(), middle, v.end());
    v.erase(v.unique(v.begin(), v.end()), v.end());
}

0人赞添加讨论(0) 举报

混吃等死

4楼-- · 2019-02-16 09:30

You could use std::set_union recursively or simply insert all sets into a result set (duplicate items are eliminated by the set). If the number of items is very small you can try to insert it all into a vector, sorting it and use std::unique on the vector.

0人赞添加讨论(0) 举报

我想做一个坏孩纸

5楼-- · 2019-02-16 09:34

Unfortunately, I believe that you are limited to a linear O(N) solution, as all a union would be is a combination of the elements in both sets.

template<typename S>
S union_sets(const S& s1, const S& s2)
{
     S result = s1;

     result.insert(s2.cbegin(), s2.cend());

     return result;
}

0人赞添加讨论(0) 举报

闹够了就滚

6楼-- · 2019-02-16 09:34

Assuming that the result needs to be a set too, then you have no choice but to insert every element of each x_i into that result set. So the obvious implementation is:

set<pair<int,int>> x(x1);
x.insert(x2.begin(), x2.end());
// etc

The remaining question is whether this can be beaten for speed.

The single-element insert takes a position hint, which if correct speeds up insertion. So it might turn out that something like this is faster than x.insert(x2.begin(), x2.end());:

auto pos = x.begin()
for (auto it = x2.begin(); it != x2.end(); ++it) {
    pos = x.insert(pos, *it);
}

It depends on the data, though: that position may or may not be accurate. You can ensure that it is by putting all the elements in order before you start, for which the best tool is probably set_union. That might better be named merge_and_dedupe_sorted_ranges, because what it does has nothing particularly to do with std::set. You could either set_union into intermediate vectors, or else into sets like this:

set<pair<int,int>> x;
set_union(x1.begin(), x1.end(), x2.begin(), x2.end(), inserter(x, x.end());

My concern with using set_union is that in order to get the benefit of adding the elements to a set in increasing order, you need to create a new empty container each time you call it (because if it's not empty then the elements added need to interleave with the values already in it). The overhead of these containers might be higher than the overhead of inserting into a set in arbitrary order: you would have to test it.

0人赞添加讨论(0) 举报

等我变得足够好

7楼-- · 2019-02-16 09:42

Find the union of the smallest sets first. That is order your sets by set length, compute the union of the two smallest sets, delete those sets, insert the union into your set list according it its size.

If you had a measurement of how similar two sets are likely to be then you best bet there would be to first find the union of the most similar sets first. That is prefer union operations that eliminate duplicates early.

Edit: And for each union operation between two sets - merge the smaller set into the bigger set.

0人赞添加讨论(0) 举报

1 2 下一页

The fastest way to find union of sets

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间