Pick a unique random subset from a set of unique v

2019-01-18 12:20发布

问题:

C++. Visual Studio 2010.

I have a std::vector V of N unique elements (heavy structs). How can efficiently pick M random, unique, elements from it?

E.g. V contains 10 elements: { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } and I pick three...

  • 4, 0, 9
  • 0, 7, 8
  • But NOT this: 0, 5, 5 <--- not unique!

STL is preferred. So, something like this?

std::minstd_rand gen; // linear congruential engine??
std::uniform_int<int> unif(0, v.size() - 1);
gen.seed((unsigned int)time(NULL));

// ...?

// Or is there a good solution using std::random_shuffle for heavy objects?

回答1:

Create a random permutation of the range 0, 1, ..., N - 1 and pick the first M of them; use those as indices into your original vector.

A random permutation is easily made with the standard library by using std::iota together with std::random_shuffle:

std::vector<Heavy> v; // given

std::vector<unsigned int> indices(V.size());
std::iota(indices.begin(), indices.end(), 0);
std::random_shuffle(indices.begin(), indices.end());

// use V[indices[0]], V[indices[1]], ..., V[indices[M-1]]

You can supply random_shuffle with a random number generator of your choice; check the docu­men­tation for details.



回答2:

Most of the time, the method provided by Kerrek is sufficient. But if N is very large, and M is orders of magnitude smaller, the following method may be preferred.

Create a set of unsigned integers, and add random numbers to it in the range [0,N-1] until the size of the set is M. Then use the elements at those indexes.

std::set<unsigned int> indices;
while (indices.size() < M)
    indices.insert(RandInt(0,N-1));


回答3:

Since you wanted it to be efficient, I think you can get an amortised O(M), assuming you have to perform that operation a lot of times. However, this approach is not reentrant.

First of all create a local (i.e. static) vector of std::vector<...>::size_type (i.e. unsigned will do) values.

If you enter your function, resize the vector to match N and fill it with values from the old size to N-1:

static std::vector<unsigned> indices;
if (indices.size() < N) {
  indices.reserve(N);
  for (unsigned i = indices.size(); i < N; i++) {
    indices.push_back(i);
  }
}

Then, randomly pick M unique numbers from that vector:

std::vector<unsigned> result;
result.reserver(M);
for (unsigned i = 0; i < M; i++) {
  unsigned const r = getRandomNumber(0,N-i); // random number < N-i
  result.push_back(indices[r]);
  indices[r] = indices[N-i-1];
  indices[N-i-1] = r;
}

Now, your result is sitting in the result vector.

However, you still have to repair your changes to indices for the next run, so that indices is monotonic again:

for (unsigned i = N-M; i < N; i++) {
  // restore previously changed values
  indices[indices[i]] = indices[i];
  indices[i] = i;
}

But this approach is only useful, if you have to run that algorithm a lot and N doesn't grow so big that you cannot live with indices eating up RAM all the the time.



标签: c++ stl random