可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I tested with the gcc C++ standard library's Mersenne twister implementation. It outperforms both the linear congruential generator and the C rand, which is most likely an LCG. A boost documentation also seems to give a similar result, but favoring Mersenne twister even more. Can anyone explain this?

#include <cstdlib>
#include <iostream>
#include <chrono>
#include <random>

class Timer
{
private:
  std::chrono::high_resolution_clock::time_point start_time;
  std::chrono::high_resolution_clock::time_point stop_time;

public:
  void start()
  {
    start_time = std::chrono::high_resolution_clock::now();
  }

  void stop()
  {
    stop_time = std::chrono::high_resolution_clock::now();
  }

  double measure()
  {
    using namespace std::chrono;
    return duration_cast<microseconds>
    (stop_time - start_time).count() / 1000000.0;
  }
};

template<typename T>
class Random
{
private:
  T generator;

public:
  Random()
  : generator
  (std::chrono::high_resolution_clock::now().time_since_epoch().count())
  {
  }

  int generate_integer(int begin, int end)
  {
    return std::uniform_int_distribution<int>(begin, end - 1)(generator);
  }
};

int main()
{
  constexpr int n = 300000000;
  Random<std::minstd_rand> mr;
  Random<std::mt19937> mt;
  Timer t;
  for (int j = 0; j < 3; ++j)
  {
    t.start();
    for (int i = 0; i < n; ++i)
    {
      static_cast<volatile void>(mr.generate_integer(0, 10));
    }
    t.stop();
    std::cout << "minstd " << t.measure() << std::endl;
    t.start();
    for (int i = 0; i < n; ++i)
    {
      static_cast<volatile void>(mt.generate_integer(0, 10));
    }
    t.stop();
    std::cout << "mersenne " << t.measure() << std::endl;
    t.start();
    for (int i = 0; i < n; ++i)
    {
      static_cast<volatile void>(std::rand() % 10);
    }
    t.stop();
    std::cout << "rand " << t.measure() << std::endl;
  }
}

result

minstd 4.70876
mersenne 1.55853
rand 4.11873
minstd 4.53199
mersenne 1.55928
rand 4.15159
minstd 4.5374
mersenne 1.55667
rand 4.13715

回答1:

The Mersenne Twister algorithm isn't as complicated as it looks. Or, more precisely, nearly all of the complicated part isn't executed often enough to seriously impact the long-term average speed.

If you look at the pseudocode implementation on Wikipedia, the vast majority of calls only exercise the second half of the extract_number() function; the rest of the non-initialization code (mostly in the twist() function) only runs in one call in 625 (in the most common version). The part that runs every time is very simple, just a handful of shifts and other bitwise operations, which can be expected to be very fast on most processors. The test at the beginning of extract_number() is nearly always false and so can be easily optimized by branch prediction.

Contrast this with the linear congruential algorithm, in which every call does an integer multiply (expensive) and modulo division (very expensive, unless you cheat by using a power of 2 modulus, which impacts the quality of your random numbers). The arithmetic involved in the LC and MT algorithms is so different that I'm not surprised if their relative performance varies from one system to another, but I have no trouble believing that MT is faster in at least some cases.

(If you look closely at the MT algorithm, it appears at first glance to be doing several modulo operations per iteration in twist(), but those are in forms that are easy to optimize away.)

As for plain old rand(), implementations of this vary widely and shouldn't be expected to be consistent across systems. Many implementations use 16-bit arithmetic and would naturally be faster than 32 or 64-bit algorithms.

回答2:

This is probably because rand is accessing thread local storage to retrieve its state.

I tried this using Visual Studio 2015 Community and got results similar to OP. Looking at the source for rand provided with the VS2012 compiler, rand() accesses thread local storage to get the previous value, which is then passed thru the LCRG math to generate the next one.

Using my own version of rand without the the local storage access gives me a time way faster - roughly 0.25 on OP's scale.

回答3:

I can't reproduce your results, when I try it rand appears much faster

chris@chris-thinkpad ~/cpp/test5 $ g++ -std=c++11  main.cpp -o main
chris@chris-thinkpad ~/cpp/test5 $ ./main 
minstd 18.168
mersenne 20.7626
rand 3.13027
minstd 17.8153
mersenne 20.8395
rand 3.19297
minstd 18.0667
mersenne 20.7672
rand 3.13617

Edit: When I do it with -O3, rand is still faster

chris@chris-thinkpad ~/cpp/test5 $ g++ -std=c++11 -O3 main.cpp -o main
chris@chris-thinkpad ~/cpp/test5 $ ./main 
minstd 7.74432
mersenne 8.54915
rand 3.04077
minstd 7.73824
mersenne 8.5711
rand 3.03335
minstd 7.74818
mersenne 8.55403
rand 3.03481

I think it is probably OS / compiler / configuration dependent? Maybe on Windows, calling std::rand() implicitly has to fetch the time from the OS or something to seed it, or something like this? (Edit: I'm not sure I understand the boost results though, and I doubt the boost results would reflect a problem like that)

My OS and compiler:

chris@chris-thinkpad ~/cpp/test5 $ cat /etc/issue
Linux Mint 17.1 Rebecca \n \l

chris@chris-thinkpad ~/cpp/test5 $ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.4-2ubuntu1~14.04' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Edit: I did it again with "-fwhole-program", didn't change much:

chris@chris-thinkpad ~/cpp/test5 $ g++ -std=c++11 -fwhole-program -O3 main.cpp -o main
chris@chris-thinkpad ~/cpp/test5 $ ./main 
minstd 8.15607
mersenne 8.03688
rand 2.9622
minstd 8.17983
mersenne 7.99626
rand 2.90655
minstd 8.16007
mersenne 7.99331
rand 2.90902