Having problems keeping a simulation deterministic

2019-07-23 07:58发布

问题:

I have a very large simulation in python with lots of modules. I call a lot of random functions. To keep the same random results I have a variable keep_seed_random.

As so:

import random

keep_seed_random = True

if keep_seed_random is False:
    fixed_seed = random.Random(0)
else:
    fixed_seed = random

Then I use fixed_seed all over the program, such as

fixed_seed.choice(['male', 'female'])
fixed_seed.randint()
fixed_seed.gammavariate(3, 3)
fixed_seed.random()
fixed_seed.randrange(20, 40)

and so on...

It used to work well. But now, that the programme is too large, there is something else interfering and the results are no longer identical, even when I choose keep_seed_random = False

My question is whether there is any other source of randomness in Python that I am missing?

P.S. I import random just once.

EDITED

We have been trying to pinpoint the exact moment when the program turned from exact same results to different results. It seemed to be when we introduced a lot of reading of databases with no connection to random modules.

The results now ALTERNATE among two similar results. That is, I run main.py once get a result of 8148.78 for GDP I run again I get 7851.49 Again 8148.78 back Again 7851.49

Also for the working version, before the change, the first result (when we create instances and pickle save them) I get one result. Then, from the second onwards the results are the same. So, I am guessing it is related to pickle reading/loading.

The question remains!

2nd EDITED

We partially found the problem. The problem is when we create instances and pickle dump and then pickle load.

We still cannot have the exact same results for creating and just loading. However, when loading repeatedly the results are exact.

Thus, the problem is in PICKLE Some randomization may occur when dumping and loading (I guess).

Thanks,

回答1:

This is difficult to diagnose without a good reproduce case as @mart0903 mentions. However, in general, there are several sources of randomness that can occur. A few things come to mind:

If for example you are using the multiprocessing and/or subprocess packages to spawn several parallel processes, you may be experiencing a race condition. That is, different processes finishing at different times each time you run the program. Perhaps you are combining the result in some way that is dependent on these threads executing in a particular order.

Perhaps you are simply looping over a dictionary and expecting the keys to be in a certain order, when in fact, dictionaries are not ordered. For example run the following a couple times in a row (I'm using Python 3.5 in case it matters) and you'll notice that the key-value pairs print out in a different order each time:

if __name__=='__main__':
    data = dict()
    data['a'] = 6
    data['b'] = 7
    data['c'] = 42
    for key in data:
        print(key + ' : ' + str(data[key]))

You might even be looking at time-stamps or set some value, or perhaps generating a uuid somewhere that you are using in a calculation.

The possibilities could go on. But again, difficult to nail down without a simple reproduce case. It may just take some good-ol breakpoints and a lot of stepping through code.

Good luck!