Weighted random numbers

2018-12-31 18:06发布

问题:

I\'m trying to implement a weighted random numbers. I\'m currently just banging my head against the wall and cannot figure this out.

In my project (Hold\'em hand-ranges, subjective all-in equity analysis), I\'m using Boost\'s random -functions. So, let\'s say I want to pick a random number between 1 and 3 (so either 1, 2 or 3). Boost\'s mersenne twister generator works like a charm for this. However, I want the pick to be weighted for example like this:

1 (weight: 90)
2 (weight: 56)
3 (weight:  4)

Does Boost have some sort of functionality for this?

回答1:

There is a straightforward algorithm for picking an item at random, where items have individual weights:

1) calculate the sum of all the weights

2) pick a random number that is 0 or greater and is less than the sum of the weights

3) go through the items one at a time, subtracting their weight from your random number, until you get the item where the random number is less than that item\'s weight

Pseudo-code illustrating this:

int sum_of_weight = 0;
for(int i=0; i<num_choices; i++) {
   sum_of_weight += choice_weight[i];
}
int rnd = random(sum_of_weight);
for(int i=0; i<num_choices; i++) {
  if(rnd < choice_weight[i])
    return i;
  rnd -= choice_weight[i];
}
assert(!\"should never get here\");

This should be straightforward to adapt to your boost containers and such.


If your weights are rarely changed but you often pick one at random, and as long as your container is storing pointers to the objects or is more than a few dozen items long (basically, you have to profile to know if this helps or hinders), then there is an optimisation:

By storing the cumulative weight sum in each item you can use a binary search to pick the item corresponding to the pick weight.


If you do not know the number of items in the list, then there\'s a very neat algorithm called reservoir sampling that can be adapted to be weighted.



回答2:

Updated answer to an old question. You can easily do this in C++11 with just the std::lib:

#include <iostream>
#include <random>
#include <iterator>
#include <ctime>
#include <type_traits>
#include <cassert>

int main()
{
    // Set up distribution
    double interval[] = {1,   2,   3,   4};
    double weights[] =  {  .90, .56, .04};
    std::piecewise_constant_distribution<> dist(std::begin(interval),
                                                std::end(interval),
                                                std::begin(weights));
    // Choose generator
    std::mt19937 gen(std::time(0));  // seed as wanted
    // Demonstrate with N randomly generated numbers
    const unsigned N = 1000000;
    // Collect number of times each random number is generated
    double avg[std::extent<decltype(weights)>::value] = {0};
    for (unsigned i = 0; i < N; ++i)
    {
        // Generate random number using gen, distributed according to dist
        unsigned r = static_cast<unsigned>(dist(gen));
        // Sanity check
        assert(interval[0] <= r && r <= *(std::end(interval)-2));
        // Save r for statistical test of distribution
        avg[r - 1]++;
    }
    // Compute averages for distribution
    for (double* i = std::begin(avg); i < std::end(avg); ++i)
        *i /= N;
    // Display distribution
    for (unsigned i = 1; i <= std::extent<decltype(avg)>::value; ++i)
        std::cout << \"avg[\" << i << \"] = \" << avg[i-1] << \'\\n\';
}

Output on my system:

avg[1] = 0.600115
avg[2] = 0.373341
avg[3] = 0.026544

Note that most of the code above is devoted to just displaying and analyzing the output. The actual generation is just a few lines of code. The output demonstrates that the requested \"probabilities\" have been obtained. You have to divide the requested output by 1.5 since that is what the requests add up to.



回答3:

If your weights change more slowly than they are drawn, C++11 discrete_distribution is going to be the easiest:

#include <random>
#include <vector>
std::vector<double> weights{90,56,4};
std::discrete_distribution<int> dist(std::begin(weights), std::end(weights));
std::mt19937 gen;
gen.seed(time(0));//if you want different results from different runs
int N = 100000;
std::vector<int> samples(N);
for(auto & i: samples)
    i = dist(gen);
//do something with your samples...

Note, however, that the c++11 discrete_distribution computes all of the cumulative sums on initialization. Usually, you want that because it speeds up the sampling time for a one time O(N) cost. But for a rapidly changing distribution it will incur a heavy calculation (and memory) cost. For instance if the weights represented how many items there are and every time you draw one, you remove it, you will probably want a custom algorithm.

Will\'s answer https://stackoverflow.com/a/1761646/837451 avoids this overhead but will be slower to draw from than the C++11 because it can\'t use binary search.

To see that it does this, you can see the relevant lines (/usr/include/c++/5/bits/random.tcc on my Ubuntu 16.04 + GCC 5.3 install):

  template<typename _IntType>
    void
    discrete_distribution<_IntType>::param_type::
    _M_initialize()
    {
      if (_M_prob.size() < 2)
        {
          _M_prob.clear();
          return;
        }

      const double __sum = std::accumulate(_M_prob.begin(),
                                           _M_prob.end(), 0.0);
      // Now normalize the probabilites.
      __detail::__normalize(_M_prob.begin(), _M_prob.end(), _M_prob.begin(),
                            __sum);
      // Accumulate partial sums.
      _M_cp.reserve(_M_prob.size());
      std::partial_sum(_M_prob.begin(), _M_prob.end(),
                       std::back_inserter(_M_cp));
      // Make sure the last cumulative probability is one.
      _M_cp[_M_cp.size() - 1] = 1.0;
    }


回答4:

What I do when I need to weight numbers is using a random number for the weight.

For example: I need that generate random numbers from 1 to 3 with the following weights:

  • 10% of a random number could be 1
  • 30% of a random number could be 2
  • 60% of a random number could be 3

Then I use:

weight = rand() % 10;

switch( weight ) {

    case 0:
        randomNumber = 1;
        break;
    case 1:
    case 2:
    case 3:
        randomNumber = 2;
        break;
    case 4:
    case 5:
    case 6:
    case 7:
    case 8:
    case 9:
        randomNumber = 3;
        break;
}

With this, randomly it has 10% of the probabilities to be 1, 30% to be 2 and 60% to be 3.

You can play with it as your needs.

Hope I could help you, Good Luck!



回答5:

Build a bag (or std::vector) of all the items that can be picked.
Make sure that the number of each items is proportional to your weighting.

Example:

  • 1 60%
  • 2 35%
  • 3 5%

So have a bag with 100 items with 60 1\'s, 35 2\'s and 5 3\'s.
Now randomly sort the bag (std::random_shuffle)

Pick elements from the bag sequentially until it is empty.
Once empty re-randomize the bag and start again.



回答6:

Choose a random number on [0,1), which should be the default operator() for a boost RNG. Choose the item with cumulative probability density function >= that number:

template <class It,class P>
It choose_p(It begin,It end,P const& p)
{
    if (begin==end) return end;
    double sum=0.;
    for (It i=begin;i!=end;++i)
        sum+=p(*i);
    double choice=sum*random01();
    for (It i=begin;;) {
        choice -= p(*i);
        It r=i;
        ++i;
        if (choice<0 || i==end) return r;
    }
    return begin; //unreachable
}

Where random01() returns a double >=0 and <1. Note that the above doesn\'t require the probabilities to sum to 1; it normalizes them for you.

p is just a function assigning a probability to an item in the collection [begin,end). You can omit it (or use an identity) if you just have a sequence of probabilities.



回答7:

I\'ve implemented several simple weighted random algorithms.



标签: c++ boost random