Random seed across different PBS jobs

2019-07-11 20:31发布

I am trying to create random numbers in Matlab which will be different across multiple PBS jobs (I am using a job array). Each Matlab job uses a parallel parfor loop in which random numbers are generated, something like this:

parfor k = 1:10      
  tmp = randi(100, [1 200]);
end

However when I plot my result, I see that the results from different jobs are not completely random - I cannot quantify it, e.g by saying the numbers are exactly the same, since my results are a function of the random numbers, but it is unmistakeable when plotting it. I tried to initialize the random seed in each job, using the process id and/or the clock:

rngSeed = feature('getpid'); % OR: rngSeed = RandStream.shuffleSeed;
rng(rngSeed);

But this didn't solve the problem. I also tried to pause for a different number of seconds in each job, before using the shuffleSeed (which is clock based).

All this made me think the parfor is somehow messing with the random seed - and it makes sense, if the parfor needs to make sure you get different random numbers across different iterations of the parfor.

My questions are, is it really the case, and how can I solve it and get randomness across different PBS jobs?

EDIT running 4 jobs, each using parfor with 2 workers, I verified that although each job has it's own seed (set outside the parfor), the numbers generated are identical across jobs (not across iterations of the parfor - that is handled by Matlab).

EDIT 2 Trying what was suggested by @Sam Roberts, I use the following code:

matlabpool open local 2
st = RandStream('mlfg6331_64');
RandStream.setGlobalStream(st);
rng('shuffle');

parfor n = 1:4       
  x=randi(100,[1 10]);
  fprintf('%d ',x(:)');
  fprintf('\n')
end
matlabpool close

but I still get the same numbers on different calls to the above script.

1条回答
Emotional °昔
2楼-- · 2019-07-11 21:27

You may want to look into using random substreams, for correct randomness and reproducibility when running in parallel.

The RandStream class allows you to create a pseudorandom number stream - numbers drawn from this stream have the properties you'd hope for (independence etc) and, if you control the seed, you also have reproducibility.

But it may not be the case that, for example, every second or every fourth number drawn from the stream has the same properties. In addition, when you use parfor you have no control over the order in which the loop iterations are run, which means that you will lose reproducibility. You can use a different substream on each worker within a parfor loop.

Some RNGs, for example mlfg6331_64, a multiplicative lagged Fibonacci generator, or mrg32k3a, a combined multiple recursive generator, support substreams - independent streams that are generated by the same RNG, but which retain the same pseudorandom properties and can be selected from separately, retaining reproducibility. In addition, many MATLAB and Toolbox functions have an option 'UseParallel' and 'UseSubstreams', which will tell them to do this stuff for you automatically.

Although the above is documented at a technical level within the MATLAB documentation, it's kind of hard to find. There's a much more explanatory guide within Statistics Toolbox documentation (should really be moved to MATLAB if you ask me). You can read it online here.

Hope that helps!

查看更多
登录 后发表回答