Is random.sample truly random?

2019-09-21 20:12发布

问题:

I have a list with 155k files. When I random.sample(list, 100), while the results are not the same from the previous sample, they look similar.

Is there a better alternative to random.sample that returns a new list of random 100 files?

folders = get_all_folders('/data/gazette-txt-files')
# get all files from all folders
def get_all_files():
    files = []
    for folder in folders:
        files.append(glob.glob("/data/gazette-txt-files/" + folder + "/*.txt"))

    # convert 2D list into 1D
    formatted_list = []
    for file in files:
        for f in file:
            formatted_list.append(f)

    # 200 random text files
    return random.sample(formatted_list, 200)

回答1:

For purposes like randomly selecting elements from a list, using random.sample suffices, true randomness isn't provided and I'm unaware if this is even theoretically possible.

random (by default) uses a Pseudo Random Number Generator (PRNG) called Mersenne Twister (MT) which, although suitable for applications such as simulations (and minor things like picking from a list of paths), shouldn't be used in areas where security is a concern due to the fact that it is deterministic.

This is why Python 3.6 also introduces secrets.py with PEP 506, which uses SystemRandom (urandom) by default and is capable of producing cryptographically secure pseudo random numbers.

Of course, bottom line is, that even if you use a PRNG or CPRNG to generate your numbers they're still going to be pseudo random.



回答2:

You may need to seed the generator. See here in the Documentation.

Just call random.seed() before you get the samples.