I have a vector containing n
elements. I need to choose a subset of m
elements randomly from the vector without repetition. What is the most efficient way of doing this? I need to do this several thousands of times in my code.
The solution on top of my mind is to use rand()
to generate a random number k
between 0
and n
. Then pick the k
th element in the vector and insert it into a std::set
. Keep doing this till the set's size becomes equal to m
. I am now assured that the set contains m
unique elements randomly chosen from the set of n
elements.
What are the other possible solutions?
Thanks.
You want a Fisher-Yates shuffle (stop after M iterations):
template<class BidiIter >
BidiIter random_unique(BidiIter begin, BidiIter end, size_t num_random) {
size_t left = std::distance(begin, end);
while (num_random--) {
BidiIter r = begin;
std::advance(r, rand()%left);
std::swap(*begin, *r);
++begin;
--left;
}
return begin;
}
Demo at http://ideone.com/3A3cv. This is significantly faster than std::random_shuffle
when you only need a few random numbers out of the set, and should be just about the same speed even if N==M
.
One way you could do this is to create a list of all the indices of the vector, shuffle them, and take the first n
to be the indices of the selected objects:
struct rangegenerator {
rangegenerator(int init) : start(init) { }
int operator()() {
return start++;
}
int start;
};
vector<T> numbers; // this is filled somewhere else
vector<int> indices(numbers.size());
generate(begin(indices), end(indices), rangegenerator(0));
random_shuffle(begin(indices), end(indices));
// then take the first n elements of indices and use them as indices into numbers