What is performance-wise the best way to generate

2019-01-31 06:02发布

问题:

I need to generate random Boolean values on a performance-critical path.

The code which I wrote for this is

std::random_device   rd;
std::uniform_int_distribution<> randomizer(0, 1);
const int val randomizer(std::mt19937(rd()));
const bool isDirectionChanged = static_cast<bool>(val);

But do not think that this is the best way to do this as I do not like doing static_cast<bool>.

On the web I have found a few more solutions

1. std::bernoulli_distribution

2. bool randbool = rand() & 1; Remember to call srand() at the beginning.

回答1:

For the purpose of performance, at a price of less "randomness" than e.g. std::mt19937_64, you can use Xorshift+ to generate 64-bit numbers and then use the bits of those numbers as pseudo-random booleans.

Quoting the Wikipedia:

This generator is one of the fastest generators passing BigCrush

Details: http://xorshift.di.unimi.it/ . There is a comparison table in the middle of the page, showing that mt19937_64 is 2 times slower and is systematic.

Below is sample code (the real code should wrap it in a class):

#include <cstdint>
#include <random>
using namespace std;

random_device rd;
/* The state must be seeded so that it is not everywhere zero. */
uint64_t s[2] = { (uint64_t(rd()) << 32) ^ (rd()),
    (uint64_t(rd()) << 32) ^ (rd()) };
uint64_t curRand;
uint8_t bit = 63;

uint64_t xorshift128plus(void) {
    uint64_t x = s[0];
    uint64_t const y = s[1];
    s[0] = y;
    x ^= x << 23; // a
    s[1] = x ^ y ^ (x >> 17) ^ (y >> 26); // b, c
    return s[1] + y;
}

bool randBool()
{
    if(bit >= 63)
    {
        curRand = xorshift128plus();
        bit = 0;
        return curRand & 1;
    }
    else
    {
        bit++;
        return curRand & (1<<bit);
    }
}


回答2:

Some quick benchmarks (code):

   647921509 RandomizerXorshiftPlus
   821202158 BoolGenerator2 (reusing the same buffer)
  1065582517 modified Randomizer
  1130958451 BoolGenerator2 (creating a new buffer as needed)
  1140139042 xorshift128plus
  2738780431 xorshift1024star
  4629217068 std::mt19937
  6613608092 rand()
  8606805191 std::bernoulli_distribution
 11454538279 BoolGenerator
 19288820587 std::uniform_int_distribution

For those who want ready-to-use code, I present XorShift128PlusBitShifterPseudoRandomBooleanGenerator, a tweaked version of RandomizerXorshiftPlus from the above link. On my machine, it is about as fast as @SergeRogatch's solution, but consistently about 10-20% faster when the loop count is high (≳100,000), and up to ~30% slower with smaller loop counts.

class XorShift128PlusBitShifterPseudoRandomBooleanGenerator {
public:
  bool randBool() {
    if (counter == 0) {
      counter = sizeof(GeneratorType::result_type) * CHAR_BIT;
      random_integer = generator();
    }
    return (random_integer >> --counter) & 1;
  }

private:
  class XorShift128Plus {
  public:
    using result_type = uint64_t;

    XorShift128Plus() {
      std::random_device rd;
      state[0] = rd();
      state[1] = rd();
    }

    result_type operator()() {
      auto x = state[0];
      auto y = state[1];
      state[0] = y;
      x ^= x << 23;
      state[1] = x ^ y ^ (x >> 17) ^ (y >> 26);
      return state[1] + y;
    }

  private:
    result_type state[2];
  };

  using GeneratorType = XorShift128Plus;

  GeneratorType generator;
  GeneratorType::result_type random_integer;
  int counter = 0;
};


回答3:

A way would be to just generate a unsigned long long for every 64 random calls as stated in the comments. An example:

#include <random>
class Randomizer
{
public:
    Randomizer() : m_rand(0), counter(0), randomizer(0, std::numeric_limits<unsigned long long>::max()) {}

    bool RandomBool()
    {
        if (!counter)
        {
            m_rand = randomizer(std::mt19937(rd()));
            counter = sizeof(unsigned long long) * 8;

        }
        return (m_rand >> --counter) & 1;
    }
private:
    std::random_device  rd;
    std::uniform_int_distribution<unsigned long long> randomizer;
    unsigned long long m_rand;
    int counter;
};


回答4:

I would prefill a (long enough) (circular) buffer of 64bit random values, and then take very quickly one bit at a time when in need of a boolean random value

#include <stdint.h>

class BoolGenerator {
  private:
  const int BUFFER_SIZE = 65536;
  uint64_t randomBuffer[BUFFER_SIZE];
  uint64_t mask;
  int counter;

  void advanceCounter {
    counter++;
    if (counter == BUFFER_SIZE) {
        counter = 0;
    }
  }

  public:
  BoolGenerator() {
    //HERE FILL YOUR BUFFER WITH A RANDOM GENERATOR
    mask = 1;
    counter = 0;
  }

  bool generate() {
    mask <<= 1;
    if (!mask) { //After 64 shifts the mask becomes zero
        mask = 1;//reset mask
        advanceCounter();//get the next value in the buffer
    }
    return randomBuffer[counter] & mask;
  }
}

Of course the class can be made general to the buffer size, the random generator, the base type (doesn't necessarily have to be uint64_t) etc.


Accessing the buffer only once every 64 calls:

#include <stdint.h> //...and much more

class BoolGenerator {
  private:
  static const int BUFFER_SIZE = 65536;
  uint64_t randomBuffer[BUFFER_SIZE];
  uint64_t currValue;
  int bufferCounter;
  int bitCounter;

  void advanceBufferCounter() {
    bufferCounter++;
    if (bufferCounter == BUFFER_SIZE) {
        bufferCounter = 0;
    }
  }

  void getNextValue() {
      currValue = randomBuffer[bufferCounter];
      bitCounter = sizeof(uint64_t) * 8;
      advanceBufferCounter();
  }

  //HERE FILL YOUR BUFFER WITH A RANDOM GENERATOR
  void initializeBuffer() {
  //Anything will do, taken from here: http://stackoverflow.com/a/19728404/2436175
      std::random_device rd;
      std::mt19937 rng(rd());
      std::uniform_int_distribution<uint64_t> uni(0,std::numeric_limits<uint64_t>::max());
      for (int i = 0; i < BUFFER_SIZE; i++ ) {
          randomBuffer[i] = uni(rng);
      }
  }

  public:
  BoolGenerator() {
      initializeBuffer();
      bufferCounter = 0;
      getNextValue();
  }

  bool generate() {
      if (!bitCounter) {
           getNextValue();
      }
      //A variation of other methods seen around
      bitCounter--;
      bool retVal = currValue & 0x01;
      currValue >>= 1;
      return retVal;
  }
};


回答5:

Unless you have further constraints on the randomness you need, the fastest way to generate a random bool is:

bool RandomBool() { return false; }

To be more specific, there are thousands of ways to generate random boolean numbers, all satisfying different constraints, and many of them do not deliver "truly" random numbers (that includes all the other answers so far). The word "random" alone does not tell anyone what properties you really need.



回答6:

If performance is your only criterion, then the answer is:

bool get_random()
{
    return true; // chosen by fair coin flip.
                 // guaranteed to be random.
}

Unfortunately, the entropy of this random number is zero, but the performance is quite fast.

Since I suspect that this random number generator is not very useful to you, you will need to quantify how random you want your booleans to be. How about a cycle length of 2048? One million? 2^19937-1? Until the end of the universe?

I suspect that, since you explicitly stated that performance is your utmost concern, then a good old fashioned linear congruential generator might be "good enough". Based on this article, I'm guessing that this generator's period is around 32*((2^31)-5), or about 68 trillion iterations. If that's not "good enough", you can drop in any C++11 compatible generator you like instead of minstd_rand.

For extra credit, and a small performance hit, modify the below code to use the biased coin algorithm to remove bias in the generator.

#include <iostream>
#include <random>

bool get_random()
{
    typedef std::minstd_rand generator_type;
    typedef generator_type::result_type result_type;

    static generator_type generator;
    static unsigned int bits_remaining = 0;
    static result_type random_bits;

    if ( bits_remaining == 0 )
    {
        random_bits = generator();
        bits_remaining = sizeof( result_type ) * CHAR_BIT - 1;
    }

    return ( ( random_bits & ( 1 << bits_remaining-- ) ) != 0 );
}

int main()
{
    for ( unsigned int i = 0; i < 1000; i++ )
    {
        std::cout << " Choice " << i << ": ";
        if ( get_random() )
            std::cout << "true";
        else
            std::cout << "false";

        std::cout << std::endl;
    }
}


回答7:

if performance is important, perhaps it's a good idea to generate a 32 bit random number and use each separate bit of it, something like this:

bool getRandBool() {
    static uint32_t randomnumber;
    static int i=0;
    if (i==0) {
        randomnumber = <whatever your favorite randonnumbergenerator is>;
        i=32;
    }
    return (randomnumber & 1<<--i); 
 }

this way the generation only impacts every 32th call



回答8:

iI think that best way is an using of precalculated random array:

uint8_t g_rand[UINT16_MAX];
bool InitRand()
{
    for (size_t i = 0, n = UINT16_MAX; i < n; ++i)
        g_rand[i] = ::rand() & 1;
    return true;
}
bool g_inited = InitRand();
inline const uint8_t * Rand()
{
    return g_rand + (::rand()&INT16_MAX);
}

It using to fill some array dst[size]:

const size_t size = 10000;
bool dst[size];
for (size_t i = 0; i < size; i += INT16_MAX)
     memcpy(dst + i, Rand(), std::min<size_t>(INT16_MAX, size - col));

Of course you can initialize pre-calculated array with using of another random function.



回答9:

Apparently I have to add another answer. Just figured out that starting with Ivy Bridge architecture Intel added RdRand CPU instruction and AMD added it later in June 2015. So if you are targeting a processor that is new enough and don't mind using (inline) assembly, the fastest way to generate random bools should be in calling RdRand CPU instruction to get a 64-bit random number as described here (scroll to approximately the middle of the page for code examples) (at that link there is also a code example for checking the current CPU for support of RdRand instruction, and see also the Wikipedia for an explanation of how to do this with CPUID instruction), and then use the bits of that number for booleans as described in my Xorshit+ based answer.