How to do intersection and union for sets of the type tr1::unordered_set in c++? I can't find much reference about it.
Any reference and code will be highly appreciated. Thank you very much.
Update: I just guessed the tr1::unordered_set should provide the function for intersection, union, difference.. Since that's the basic operation of sets.
Of course I can write a function by myself, but I just wonder if there are built in function from tr1.
Thank you very much.
I see that set_intersection()
et al. from the algorithm
header won't work as they explicitly require their inputs to be sorted -- guess you ruled them out already.
It occurs to me that the "naive" approach of iterating through hash A and looking up every element in hash B should actually give you near-optimal performance, since successive lookups in hash B will be going to the same hash bucket (assuming that both hashes are using the same hash function). That should give you decent memory locality, even though these buckets are almost certainly implemented as linked lists.
Here's some code for unordered_set_difference()
, you can tweak it to make the versions for set union and set difference:
template <typename InIt1, typename InIt2, typename OutIt>
OutIt unordered_set_intersection(InIt1 b1, InIt1 e1, InIt2 b2, InIt2 e2, OutIt out) {
while (!(b1 == e1)) {
if (!(std::find(b2, e2, *b1) == e2)) {
*out = *b1;
++out;
}
++b1;
}
return out;
}
Assuming you have two unordered_set
s, x
and y
, you can put their intersection in z
using:
unordered_set_intersection(
x.begin(), x.end(),
y.begin(), y.end(),
inserter(z, z.begin())
);
Unlike bdonlan's answer, this will actually work for any key types, and any combination of container types (although using set_intersection()
will of course be faster if the source containers are sorted).
NOTE: If bucket occupancies are high, it's probably faster to copy each hash into a vector
, sort them and set_intersection()
them there, since searching within a bucket containing n elements is O(n).
There's nothing much to it - for intersect, just go through every element of one and ensure it's in the other. For union, add all items from both input sets.
For example:
void us_isect(std::tr1::unordered_set<int> &out,
const std::tr1::unordered_set<int> &in1,
const std::tr1::unordered_set<int> &in2)
{
out.clear();
if (in2.size() < in1.size()) {
us_isect(out, in2, in1);
return;
}
for (std::tr1::unordered_set<int>::const_iterator it = in1.begin(); it != in1.end(); it++)
{
if (in2.find(*it) != in2.end())
out.insert(*it);
}
}
void us_union(std::tr1::unordered_set<int> &out,
const std::tr1::unordered_set<int> &in1,
const std::tr1::unordered_set<int> &in2)
{
out.clear();
out.insert(in1.begin(), in1.end());
out.insert(in2.begin(), in2.end());
}
based on the previous answer:
C++11 version, if the set supports a fast look up function find()
(return values are moved efficiently)
/** Intersection and union function for unordered containers which support a fast lookup function find()
* Return values are moved by move-semantics, for c++11/c++14 this is efficient, otherwise it results in a copy
*/
namespace unorderedHelpers {
template<typename UnorderedIn1, typename UnorderedIn2,
typename UnorderedOut = UnorderedIn1>
UnorderedOut makeIntersection(const UnorderedIn1 &in1, const UnorderedIn2 &in2)
{
if (in2.size() < in1.size()) {
return makeIntersection<UnorderedIn2,UnorderedIn1,UnorderedOut>(in2, in1);
}
UnorderedOut out;
auto e = in2.end();
for(auto & v : in1)
{
if (in2.find(v) != e){
out.insert(v);
}
}
return out;
}
template<typename UnorderedIn1, typename UnorderedIn2,
typename UnorderedOut = UnorderedIn1>
UnorderedOut makeUnion(const UnorderedIn1 &in1, const UnorderedIn2 &in2)
{
UnorderedOut out;
out.insert(in1.begin(), in1.end());
out.insert(in2.begin(), in2.end());
return out;
}
}