C++. Visual Studio 2010.
I have a std::vector
V of N unique elements (heavy structs). How can efficiently pick M random, unique, elements from it?
E.g. V contains 10 elements: { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } and I pick three...
- 4, 0, 9
- 0, 7, 8
- But NOT this: 0, 5, 5 <--- not unique!
STL is preferred. So, something like this?
std::minstd_rand gen; // linear congruential engine??
std::uniform_int<int> unif(0, v.size() - 1);
gen.seed((unsigned int)time(NULL));
// ...?
// Or is there a good solution using std::random_shuffle for heavy objects?
Create a random permutation of the range 0, 1, ..., N - 1
and pick the first M
of them; use those as indices into your original vector.
A random permutation is easily made with the standard library by using std::iota
together with std::random_shuffle
:
std::vector<Heavy> v; // given
std::vector<unsigned int> indices(V.size());
std::iota(indices.begin(), indices.end(), 0);
std::random_shuffle(indices.begin(), indices.end());
// use V[indices[0]], V[indices[1]], ..., V[indices[M-1]]
You can supply random_shuffle
with a random number generator of your choice; check the documentation for details.
Most of the time, the method provided by Kerrek is sufficient. But if N is very large, and M is orders of magnitude smaller, the following method may be preferred.
Create a set of unsigned integers, and add random numbers to it in the range [0,N-1] until the size of the set is M. Then use the elements at those indexes.
std::set<unsigned int> indices;
while (indices.size() < M)
indices.insert(RandInt(0,N-1));
Since you wanted it to be efficient, I think you can get an amortised O(M)
, assuming you have to perform that operation a lot of times. However, this approach is not reentrant.
First of all create a local (i.e. static
) vector of std::vector<...>::size_type
(i.e. unsigned
will do) values.
If you enter your function, resize the vector to match N
and fill it with values from the old size to N-1
:
static std::vector<unsigned> indices;
if (indices.size() < N) {
indices.reserve(N);
for (unsigned i = indices.size(); i < N; i++) {
indices.push_back(i);
}
}
Then, randomly pick M
unique numbers from that vector:
std::vector<unsigned> result;
result.reserver(M);
for (unsigned i = 0; i < M; i++) {
unsigned const r = getRandomNumber(0,N-i); // random number < N-i
result.push_back(indices[r]);
indices[r] = indices[N-i-1];
indices[N-i-1] = r;
}
Now, your result is sitting in the result
vector.
However, you still have to repair your changes to indices
for the next run, so that indices
is monotonic again:
for (unsigned i = N-M; i < N; i++) {
// restore previously changed values
indices[indices[i]] = indices[i];
indices[i] = i;
}
But this approach is only useful, if you have to run that algorithm a lot and N
doesn't grow so big that you cannot live with indices
eating up RAM all the the time.