Possible sources for random number seeds

2019-01-26 10:14发布

问题:

Two points -- first, the example is in Fortran, but I think it should hold for any language; second, the built in random number generators are not truly random and other generators exist, but we're not interested in using them for what we're doing.

Most discussions on random seeds acknowledge that if the program doesn't seed it at run-time, then the seed is generated at compile time. So, the same sequence of numbers is generated every time the program is run, which is not good for random numbers. One way to overcome this is to seed the random number generator with the system clock.

However, when running in parallel with MPI on a multi-core machine, the system clock approach for us generated the same kinds of problems. While the sequences changed from run to run, all processors got the same system clock and thus the same random seed and same sequences.

So consider the following example code:

PROGRAM clock_test
   IMPLICIT NONE
   INCLUDE "mpif.h"
   INTEGER :: ierr, rank, clock, i, n, method
   INTEGER, DIMENSION(:), ALLOCATABLE :: seed
   REAL(KIND=8) :: random
   INTEGER, PARAMETER :: OLD_METHOD = 0, &
                         NEW_METHOD = 1

   CALL MPI_INIT(ierr)

   CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)

   CALL RANDOM_SEED(SIZE=n)
   ALLOCATE(seed(n))

   DO method = 0, 1
      SELECT CASE (method)
      CASE (OLD_METHOD)
         CALL SYSTEM_CLOCK(COUNT=clock)
         seed = clock + 37 * (/ (i - 1, i = 1, n) /)
         CALL RANDOM_SEED(put=seed)  
         CALL RANDOM_NUMBER(random)

         WRITE(*,*) "OLD Rank, dev = ", rank, random
      CASE (NEW_METHOD)
         OPEN(89,FILE='/dev/urandom',ACCESS='stream',FORM='UNFORMATTED')
         READ(89) seed
         CLOSE(89)
         CALL RANDOM_SEED(put=seed)  
         CALL RANDOM_NUMBER(random)

         WRITE(*,*) "NEW Rank, dev = ", rank, random
      END SELECT
      CALL MPI_BARRIER(MPI_COMM_WORLD, ierr)
   END DO

   CALL MPI_FINALIZE(ierr)
END PROGRAM clock_test

Which when run on my workstation with 2 cores, gives:

OLD Rank, dev =            0  0.330676306089146     
OLD Rank, dev =            1  0.330676306089146     
NEW Rank, dev =            0  0.531503215980609     
NEW Rank, dev =            1  0.747413828750221     

So, we overcame the clock issue by reading the seed from /dev/urandom instead. This way each core gets its own random number.

What other seed approaches are there that will work in a multi-core, MPI system and still be unique on each core, from run to run?

回答1:

If you take a look in Random Numbers In Scientific Computing: An Introduction by Katzgrabber (which is an excellent, lucid discussion of the ins and outs of using PRNGs for technical computing), in parallel they suggest using a hash function of time and PID to generate a seed. From their section 7.1:

long seedgen(void)  {
    long s, seed, pid;

    pid = getpid();
    s = time ( &seconds ); /* get CPU seconds since 01/01/1970 */

    seed = abs(((s*181)*((pid-83)*359))%104729); 
    return seed;
}

of course, in Fortran this would be something like

function seedgen(pid)
    use iso_fortran_env
    implicit none
    integer(kind=int64) :: seedgen
    integer, intent(IN) :: pid
    integer :: s

    call system_clock(s)
    seedgen = abs( mod((s*181)*((pid-83)*359), 104729) ) 
end function seedgen

It's also sometimes handy to be able to pass in the time, rather than calling it from within seedgen, so that when you are testing you can give it fixed values that then generate a reproducable (== testable) sequence.



回答2:

System time is usually returned in (or at least easily converted into) an integer type: simply add the rank of the process to the value and use that to seed the random number generator.