glibc rand function implementation

2019-01-19 23:39发布

问题:

I'm reading c standard library rand() function implementation with glibc source code. stdlib/random_r.c, line 359

int
__random_r (buf, result)
            struct random_data *buf;
            int32_t *result;
{
  int32_t *state;

  if (buf == NULL || result == NULL)
    goto fail;

  state = buf->state;

  if (buf->rand_type == TYPE_0)
    {
      int32_t val = state[0];
      val = ((state[0] * 1103515245) + 12345) & 0x7fffffff;
      state[0] = val;
      *result = val;
    }
  else
    {
      int32_t *fptr = buf->fptr;
      int32_t *rptr = buf->rptr;
      int32_t *end_ptr = buf->end_ptr;
      int32_t val;

      val = *fptr += *rptr;
      /* Chucking least random bit.  */
      *result = (val >> 1) & 0x7fffffff;
      ++fptr;
      if (fptr >= end_ptr)
        {
          fptr = state;
          ++rptr;
        }
      else
        {
          ++rptr;
          if (rptr >= end_ptr)
            rptr = state;
        }
      buf->fptr = fptr;
      buf->rptr = rptr;
    }
  return 0;

 fail:
  __set_errno (EINVAL);
  return -1;
}

I don't understand how random_r generate random number when (buf->rand_type != TYPE_0), anyone please explain? Thanks.

回答1:

glibc rand() has two different generator implementations:

  1. A simple linear congruential generator (LCG), defined by the following equation:

    val = ((state * 1103515245) + 12345) & 0x7fffffff

    (& 0x7fffffff throws away the least random most significant bit)

    This is a very simple, single state LCG. It has some drawbacks. The most important one is that, because it is a single state generator, it does not generate a fully pseudorandom number on each separate rand() call. What it really does is that it traverses the whole range (2^31) in a pseudorandom order. This has a meaningful implication: when you obtain some number it means that you will not obtain that number again in the present period. You will obtain that number again exactly in the next 2^31 rand() call, no sooner, no later.

    This generator is called the TYPE_0 in the glibc source.

  2. A slightly more advanced, additive feedback generator. That generator has many states, which means that it does not have the "traverse property" described above. You can get the same number twice (or more times) during the same period.

    You can find an excellent description of that algorithm here.

    This generator is called the TYPE_1, TYPE_2, TYPE_3 or TYPE_4 in the glibc source.

    Coming back to your question, that is how it generates values:

    seeding_stage() // (code omitted here, see the description from above link)
    
    for (i=344; i<MAX; i++)
    {
        r[i] = r[i-31] + r[i-3];
        val = ((unsigned int) r[i]) >> 1;
    }
    

    The code after the else in your question is simply the above code, but written in a different way (using pointers to the array containing previous values).

Which generator is used depends on the size of the initial state set with the initstate() function. The first (LCG) generator is used only when state size is 8 bytes. When it is bigger, the second generator is used. When you set your seed using srand() the size of the state is 128 bytes by default, so the second generator is used. Everything is written in comments in the glibc source file referenced by you in your question.



回答2:

In case anyone else needs a simple reimplementation of the GNU C Library's srand()/rand() functions, this C# class reproduces the generated random numbers exactly. The unchecked keyword is to explicitly allow integer overflow. (Based on Piotr Jurkiewicz's answer.)

public class GnuRand
{
    private uint[] r;
    private int n;

    public GnuRand(uint seed)
    {
        r = new uint[344];

        unchecked
        {
            r[0] = seed;
            for (int i = 1; i < 31; i++)
            {
                r[i] = (uint)((16807 * (ulong)r[i - 1]) % 2147483647);
            }
            for (int i = 31; i < 34; i++)
            {
                r[i] = r[i - 31];
            }
            for (int i = 34; i < 344; i++)
            {
                r[i] = r[i - 31] + r[i - 3];
            }
        }

        n = 0;
    }

    public int Next()
    {
        unchecked
        {
            uint x = r[n % 344] = r[(n + 313) % 344] + r[(n + 341) % 344];
            n = (n + 1) % 344;
            return (int)(x >> 1);
        }
    }
}


回答3:

The two implementations are exactly the same, except that they use different random data.

The TYPE_0 always uses the magic numbers 1103515245 and 12345, together with the current state.

Otherwise, it uses magic numbers taken from a pool of random data. Each time it is called it walks a bit further through the pool. As it goes it replaces the data with new pseudo-random numbers, based on the original ones, so that it gets new numbers when it wraps around and starts the walk again.



标签: c random glibc