Generating the same random numbers with threads in

2019-07-19 03:00发布

问题:

I am attempting to multithread some code with OMP. Currently my sequentially version using rand() to generate a set of random numbers with a consistent seed so that they return the same results when run each time. I want to parallelise my code but rand() is not thread safe. Can someone please show me how i would go about using a random number generator that works on threads so i can produce the same data set upon each test similar to that of using a seed with rand(). My code im parallelising is as follows:

    long linkCnt;
    long j;
    long t;
    srand(randInit);

    linkCnt=0; //Keep track of # of outgoing edges
 #pragma omp parallel for schedule(runtime) private(t)
    for(j = 0; j < G->N; j++)
    {

        if(i == j){
            t = NO_CONN;
        } else {
            t = (rand() % ((MAX_EDGE_WEIGTH-1) * 2)+1); //50% of having no connection
            if(t > MAX_EDGE_WEIGTH){
                //t = INF; //Like no connection
                t = NO_CONN; //Like no connection
            } else {
                linkCnt++;
                G->visited[j] = VISITED; //Do this to find isolated nods that have no incomming edge
            }
        }

        G->node[i][j] = t;
    }

回答1:

There seem to be a couple problems getting conflated here.

First, the non-thread-safe nature of rand() function means that calling rand() at the same time from different threads might yield different values than if it were called sequentially. It's probably easiest to explain this with a simple example, so let's look at a 32-bit version of PCG since it's nice and simple. It has a 32-bit state, and generates 32-bit numbers like this:

static uint32_t state = ...;

static uint32_t generate(void) {
  uint32_t s = state;
  uint32_t v = ((s >> ((s >> 28) + 4)) ^ s) * (277803737U);
  v ^= v >> 22;
  state = state * 747796405U + 1729U;
  return v;
}

Now think about what happens if two threads call generate() at roughly the same time. Maybe they both get the same value for state, and so generate the same random number twice. Maybe one updates state before the other reads it, so they get different values.

We can eliminate the problem by protecting the generate() function with a mutex or, in the case of 32-bit PGC (and this is why I use it for reproducible numbers), using atomics. If we do that then we'll always get the same numbers in the same order.

Part two of the problem is what happens when the order of the callers gets mixed up in your code. Let's say you have two threads (called A and B), and they each have to run two iterations of your loop. Even if you're getting your random numbers from a thread-safe source, the order of the calls could be AABB, ABAB, ABBA, BBAA, BABA, or BAAB, each of which would cause your code to generate a different result.

There are several straightforward ways around this. First, you could use synchronization primitives to ensure that each iteration calls generate in the order you want. The easiest way would probably be to use a queue, but you'll waste a lot of time synchronizing and you'll lose some opportunities for parallelism (and you have to restructure your code significantly).

If you have a relatively small number of iterations, you might consider generating an array ahead of time. Think:

int i;
int nums[LEN];
for (i = 0 ; i < LEN ; i++)
  nums[i] = generate();
#pragma omp parallel for ...
for (i = 0 ; i < LEN ; i++) {
  do_stuff(nums[i]);
}

A better solution, though, might be to just ditch the idea of generating random numbers altogether and instead use hashing. https://burtleburtle.net/bob/hash/integer.html has some options. An example of that would be something like

for (int i = 0 ; i < LEN ; i++) {
  do_stuff(hash(i));
}

Of course you can throw in some salt, maybe even use rand() to generate your salt.



回答2:

Here is a block based approach that divides the problem space in N/BLOCK_SIZE blocks and reseeds the RNG with your randInit + block number for each block. This gives a reproducible output no matter the number of threads you have. It also generates the same initial N numbers for sequence of N + x. This as long as you keep the same BLOCK_SIZE.

A good block-size is probably something like your typical N / (max_num_procs * 2). But there is room for experimentation.

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

#define N_DEFAULT  48   //Default number of nodes 
#define BLOCK_SIZE 12   //BLOCK SIZE number of nodes per block.
                        //Changes this changes reseed frequencey, 
                        //.. and hence the generated sequence
#define randInit   42   //Had to be something.

int main(int argc , char* argv[])
{
    int N=N_DEFAULT;
    if (argc >1)
        N=atoi(argv[1]); 

    int rands[N];// keep our random numbers for sequential debug output
    int n=BLOCK_SIZE;
    int num_blocks=(N+BLOCK_SIZE-1)/ BLOCK_SIZE;  // ceil(N/BLOCK_SIZE)
    int nt=omp_get_max_threads();   


    printf("          N: %d\n",N);
    printf("     Blocks: %d, (size: %d)\n",num_blocks,n);
    printf("    threads: %d\n",nt);

    //Parallel random generation
    #pragma omp parallel for schedule(runtime) 
    for (int J=0;J<num_blocks;J++)
    {
        int block_seed=randInit+J;      // unique block seed
        int start = J * n;
        int end= start+n > N?N:start+n;

        int tid = omp_get_thread_num(); // Just for debug
        printf("thread %d: works on block %d (%d - %d )\n",tid,J,start,end);

        for (int j=start; j < end;j++)
        {           
            int t=rand_r(&block_seed); //rand_r provides thread safe (re-entrant rand)
            rands[j]=t;
        }
    }

    //Output for debug single thread
    for (int j=0; j < N;j++)
    {
        printf("%d : %d \n",j,rands[j]);
    }   
    return 0;
}

Output with different N, and number of threads shown below.

N: 24                                   N: 27
Blocks: 3, (size: 8)                    Blocks: 4, (size: 8)
threads: 4                              threads: 1
-------------------------------------|-------------------------------
thread 1: works on block 1 (8 - 16 )    thread 0: works on block 0 (0 - 8 )
thread 2: works on block 2 (16 - 24 )   thread 0: works on block 1 (8 - 16 )
thread 0: works on block 0 (0 - 8 )     thread 0: works on block 2 (16 - 24 )
                                        thread 0: works on block 3 (24 - 27 )
-------------------------------------|-------------------------------
0 : 681191333                           0 : 681191333
1 : 928546885                           1 : 928546885
2 : 1457394273                          2 : 1457394273
3 : 941445650                           3 : 941445650
4 : 2129613237                          4 : 2129613237
5 : 1661015563                          5 : 1661015563
6 : 2071432601                          6 : 2071432601
7 : 222443696                           7 : 222443696
8 : 1156886562                          8 : 1156886562
9 : 398918689                           9 : 398918689
10 : 170756699                         10 : 170756699
11 : 703115845                         11 : 703115845
12 : 1424182583                        12 : 1424182583
13 : 1516198481                        13 : 1516198481
14 : 1740837599                        14 : 1740837599
15 : 1148851528                        15 : 1148851528
16 : 1633630368                        16 : 1633630368
17 : 2015727614                        17 : 2015727614
18 : 1031602773                        18 : 1031602773
19 : 463737465                         19 : 463737465
20 : 720848057                         20 : 720848057
21 : 1369285272                        21 : 1369285272
22 : 1411290150                        22 : 1411290150
23 : 2074210785                        23 : 2074210785
-------------------------------------|-------------------------------
                                       24 : 2109326622
                                       25 : 1486099418
                                       26 : 1892448847