std::normal_distribution results in wrong

2020-03-24 04:03发布

问题:

Anyone came access this issue? Per 1 the implementations are not required to produce same data. What about in practice - are there many differences in STL implementations among arm, x86, free and commercial compilers?

// g++ --std=c++11 -o a minimal.cpp && ./a

#include <iostream>
#include <random>

using namespace std;

int
main()
{
  std::mt19937_64 gen;
  gen.seed(17);

  cout << "\nNormal\n";
  normal_distribution<double> distr1;
  for (int i = 0; i < 2; i++) {
    double delay = distr1(gen);
    printf("  Value = %15.15g\n", delay);
  }
  return(0);
}


/*

Results

(1) gcc-4.8.0 linux 64b version result
  Normal
    Value = 1.03167351251536
    Value = 1.21967569130525
(2) Microsoft Visual Studio Community 2015 Version 14.0.23107.0 D14REL
      or 
    Microsoft Visual Studio Professional 2012 Version 11.0.60610.01 Update 3
  Normal
    Value = 1.21967569130525
    Value = 1.03167351251536  // same values in wrong (different) order
*/

I could understand use of a different algorithm for either the generator or the distribution on some special HW platform but this difference seems more as a bug.

Here is some more code I used to diagnose where does the difference come from and workaround it: - Generator and uniform distribution match on win and linux. - The normal distribution matches numerically except for pair-wise order

//   g++ --std=c++11 -o a workaround.cpp && ./a


#include <iostream>
#include <random>
#include <stack>


using namespace std;

typedef  std::mt19937_64  RandGenLowType;

// Helper wrapper - it did confirm that the differences
// do NOT come from the generator
class RandGenType : public RandGenLowType {
  public:
    result_type operator()() {
      result_type val = RandGenLowType::operator()();
      printf("    Gen pulled %20llu\n", val);
      return(val);
    }
};


typedef normal_distribution<double> NormalDistrLowType;

// Workaround wrapper to swap the output data stream pairwise
class NormalDistrType : NormalDistrLowType {
  public:
    result_type operator()(RandGenType &pGen) {
      // Keep single flow (used variables, includes) same for all platforms
      if (win64WaStack.empty()) {
        win64WaStack.push(NormalDistrLowType::operator()(pGen));
#ifdef _MSC_VER
        win64WaStack.push(NormalDistrLowType::operator()(pGen));
#endif
      }
      result_type lResult = win64WaStack.top();
      win64WaStack.pop();
      return(lResult);
    }    
  private:
    std::stack<result_type> win64WaStack;
};


int
main()
{
  RandGenType gen;
  gen.seed(17);

  // No platform issue, no workaround used
  cout << "\nUniform\n";
  uniform_real_distribution<double> distr;
  for (int i = 0; i < 4; i++) {
    double delay = distr(gen);
    printf("  Delay = %15.15g\n", delay);
  }

  // Requires the workaround
#ifdef _MSC_VER
  cout << "Workaround code is active, swapping the output stream pairwise\n";
#endif
  cout << "\nNormal\n";
  //normal_distribution<float> distr1;
  NormalDistrType distr1;
  for (int i = 0; i < 10; i++) {
    double delay = distr1(gen);
    printf("  Value = %15.15g\n", delay);
  }
  return(0);
}

回答1:

Several common methods of generating normally distributed random numbers, such as the Box-Muller transform and the Marsaglia polar method, generate two random numbers at once. A distribution object using one of those methods would generate two random numbers, return one of them, and save the other one for the next time it's called.

Which one is returned and which one is stored is, of course, completely up to the library author. It looks like libstdc++ and MSVC use the same algorithm, but happen to choose differently.