18 trillion coin tosses, where did I go wrong? [cl

2019-05-14 03:00发布

问题:

Why does the following C code gives me different results on my desktop and server, both running similar versions of Linux?

It finds the longest same side in a row sequence in 18 trillion coin tosses. [See Iain M. Banks' science fiction novel Consider Phlebas.]

On the server, after 15.7 trillion coin tosses (it's still running), the longest same side in a row sequence so far is only 29. Since 2^44 = 17,592,186,044,416, I'd expect the longest same side sequence to be somewhere in the low to mid 40's, and probably 44 after all 18 trillion have been completed.

On the desktop after only 4.7 billion coin tosses the longest sequence was already 31, since 2^31 = 2,147,483,648, and that sounded about right.

So why have I got a sequence of only 29 on the server after 15.7 trillion coin tosses but a sequence of 31 after only 4.7 billion on my desktop?

Modulo bias was my first thought. RAND_MAX is the same on both desktop and server, 2,147,483,647 (a 32 bit signed long). So the rand() function will give me a number 0 <= rand() <= 2,147,483,647. 0 is even and 2,147,483,647 is odd, so unless I'm very much mistaken there's no modulo bias introduced by my int rand_num = (rand() % 2); line of code.

I know that the C standard library's pseudo-random number generator is not considered adequate for cryptography. Surely that could not be a factor when generating, admittedly really rather long, sequences of zeros and ones. Could it?

Here's the source:

Compiled on both machines using: gcc -O3 -o 18TCT 18TrillionCoinTosses.c

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(int argc, char* argv[])
{
    srand(time(NULL));

    int current_seq = 0;
    int longest_seq = 0;
    int prev_rand_num = -1;

    long long i = 0;
    long long total = 18000000000000;

    // To serve as a rudimentary progress indicator.
    long billion_counter = 0;
    long billion = 1000000000;

    while (i < total)
    {
        int rand_num = (rand() % 2);

        if (rand_num == prev_rand_num)
        {
            current_seq++;

            if (current_seq >= longest_seq)
            {
                longest_seq = current_seq;
                printf("Longest sequence so far: %d (on iteration %lli)\n", longest_seq, i);
            }
        }
        else
            current_seq = 1;

        if (billion_counter == billion)
        {
            billion_counter = 0;
            printf("Progress report, current iteration: %lli\n", i);
        }

        prev_rand_num = rand_num;

        i++;
        billion_counter++;
    }

    printf("\nTotal coins tossed: %lli\n", i);
    printf("Longest sequence: %d\n", longest_seq);
}

回答1:

Your random number generator is probably repeating after 2^32 = 4294967296 calls, so you're not really simulating 18 trillion trials. You need a better RNG, one that keeps more than 32 bits of internal state. On many systems, you can access a better RNG by simply calling random() instead of rand(). (On my system, man random says "random -- better random number generator" and "The period of this random number generator is very large, approximately 16*((2**31)-1)". Although that's "only" 34,359,738,352, which is still short of your 18 trillion.)

Also, as a side point, rand() % 2 is risky, although most RNGs these days don't have the problem that will burn you there (and if you did have that problem, you'd know it, because among other things you'd get 0 in a row no matter what).


Addendum: You can find references to some other, better random-number generators at question 13.15 in the C FAQ list: http://c-faq.com/lib/rand.html .



回答2:

Even though your "random" bit 0 had equal zeros and ones, the pseudo random generator function rand() sequence repeats relatively often. In my test it repeated after 2147483648 (2**31) iterations of the loop. So there is no point going to 18 trillion. I ran the test several times, always the same result.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void)
{
    unsigned long long n = 0;
    int a, b, c, d;
    int e, f, g, h;

    srand((unsigned)time(NULL));
    e = a = rand();
    f = b = rand();
    g = c = rand();
    h = d = rand();
    do {
        n++;
        e = f;
        f = g;
        g = h;
        h = rand();
    } while (e != a || f != b || g != c || h != d);
    printf("%llu\n", n);
}


回答3:

Your code seems to be fine. The problem might be the RNG your are using.

I don't think that rand() % 2 is uniform. Take a look here: Uniformity of random numbers taken modulo N

Why not C++11 Random Number Generators?http://en.cppreference.com/w/cpp/numeric/random/uniform_int_distribution

Last but not least, could -O3 be messing up with something?

-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -fsplit-paths -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone options.



回答4:

As others have pointed out, rand is not a reliable source of randomness. It's right there in the man page:

NAME
     rand, rand_r, srand, sranddev -- bad random number generator

...

DESCRIPTION
     These interfaces are obsoleted by arc4random(3).

For good randomness you'll have to go outside the standard C libraries.

  • arc4random, the suggested replacement.
  • drand48
  • OpenSSL's RAND_bytes is cryptographically secure, but can be hard to use. Here is a good example of how to use it.
  • PCG, a replacement for the Mersenne Twist

Note that if you're on a Mac it will complain that RAND_bytes() is deprecated. Don't worry, OpenSSL isn't going anywhere and is fine to use. The deprecation has to do with binary compatibility issues when upgrading Apple products.



标签: c random