Random number generator that returns only one numb

2019-04-14 17:38发布

Does Python have a random number generator that returns only one random integer number each time when next() function is called? Numbers should not repeat and the generator should return random integers in the interval [1, 1 000 000] that are unique.

I need to generate more than million different numbers and that sounds as if it is very memory consuming in case all the number are generated at same time and stored in a list.

6条回答
Lonely孤独者°
2楼-- · 2019-04-14 18:16

You are looking for a linear congruential generator with a full period. This will allow you to get a pseudo-random sequence of non-repeating numbers in your target number range.

Implementing a LCG is actually very simple, and looks like this:

def lcg(a, c, m, seed = None):
    num = seed or 0
    while True:
        num = (a * num + c) % m
        yield num

Then, it just comes down to choosing the correct values for a, c, and m to guarantee that the LCG will generate a full period (which is the only guarantee that you get non-repeating numbers). As the Wikipedia article explains, the following three conditions need to be true:

  1. m and c need to be relatively prime.
  2. a - 1 is divisible by all prime factors of m
  3. a - 1 is divisible by 4, if m is also divisible by 4.

The first one is very easily guaranteed by simply choosing a prime for c. Also, this is the value that can be chosen last, and this will ultimately allow us to mix up the sequence a bit.

The relationship between a - 1 and m is more complicated though. In a full period LCG, m is the length of the period. Or in other words, it is the number range your numbers come from. So this is what you are usually choosing first. In your case, you want m to be around 1000000. Choosing exactly your maximum number might be difficult since that restricts you a lot (in both your choice of a and also c), so you can also choose numbers larger than that and simply skip all numbers outside of your range later.

Let’s choose m = 1000000 now though. The prime factors of m are 2 and 5. And it’s also obviously divisible by 4. So for a - 1, we need a number that is a multiple of 2 * 2 * 5 to satisfy the conditions 2 and 3. Let’s choose a - 1 = 160, so a = 161.

For c, we are using a random prime that’s somewhere in between of our range: c = 506903

Putting that into our LCG gives us our desired sequence. We can choose any seed value from the range (0 <= seed <= m) as the starting point of our sequence.

So let’s try it out and verify that what we thought of actually works. For this purpose, we are just collecting all numbers from the generator in a set until we hit a duplicate. At that point, we should have m = 1000000 numbers in the set:

>>> g = lcg(161, 506903, 1000000)
>>> numbers = set()
>>> for n in g:
        if n in numbers:
            raise Exception('Number {} already encountered before!'.format(n))
        numbers.add(n)

Traceback (most recent call last):
  File "<pyshell#5>", line 3, in <module>
    raise Exception('Number {} already encountered before!'.format(n))
Exception: Number 506903 already encountered before!
>>> len(numbers)
1000000

And it’s correct! So we did create a pseudo-random sequence of numbers that allowed us to get non-repeating numbers from our range m. Of course, by design, this sequence will be always the same, so it is only random once when you choose those numbers. You can switch up the values for a and c to get different sequences though, as long as you maintain the properties mentioned above.


The big benefit of this approach is of course that you do not need to store all the previously generated numbers. It is a constant space algorithm as it only needs to remember the initial configuration and the previously generated value.

It will also not deteriorate as you get further into the sequence. This is a general problem with solutions that just keep generating a random number until a new one is found that hasn’t been encountered before. This is because the longer the list of generated numbers gets, the less likely you are going to hit a numbers that’s not in that list with an evenly distributed random algorithm. So getting the 1000000th number will likely take you a long time to generate with memory based random generators.

But of course, having this simply algorithm which just performs some multiplication and some addition does not appear very random. But you have to keep in mind that this is actually the basis for most pseudo-random number generators out there. So random.random() uses something like this internally. It’s just that the m is a lot larger, so you don’t notice it there.

查看更多
该账号已被封号
3楼-- · 2019-04-14 18:18

Considering your numbers should fit in a 64bit integer, one million of them stored in a list would be up to 64 mega bytes plus the list object overhead, if your processing computer can afford that the easyest way is to use shuffle:

import random
randInts = list(range(1000000))
random.shuffle(randInts)
print(randInts)

Note that the other method is to keep track of the previously generated numbers, which will get you to the point of having all of them stored too.

查看更多
爷的心禁止访问
4楼-- · 2019-04-14 18:28

For a large number of non-repeating random numbers use an encryption. With a given key, encrypt the numbers: 0, 1, 2, 3, ... Since encryption is uniquely reversible then each encrypted number is guaranteed to be unique, provided you use the same key. For 64 bit numbers use DES. For 128 bit numbers use AES. For other size numbers use some Format Preserving Encryption. For pure numbers you might find Hasty Pudding cipher useful as that allows a large range of different bit sizes and non-bit sizes as well, like [0..5999999].

Keep track of the key and the last number you encrypted. When you need a new unique random number just encrypt the next number you haven't used so far.

查看更多
男人必须洒脱
5楼-- · 2019-04-14 18:30
import random 

# number of random entries 
x = 1000

# The set of all values 
y = {}
while (x > 0) :
    a = random.randint(0 , 10**10)
    if a not in y :  
        a -= 1

This way you are sure you have perfectly random unique values x represents the number of values you want

查看更多
我欲成王,谁敢阻挡
6楼-- · 2019-04-14 18:35

If you really care about the memory you could use a NumPy array (or a Python array).

A one million NumPy array of int32 (more than enough to contain integers between 0 and 1 000 000) will only consume ~4MB, Python itself would require ~36MB (roughly 28byte per integer and 8 byte for each list element + overallocation) for an identical list:

>>> # NumPy array
>>> import numpy as np
>>> np.arange(1000000, dtype=np.int32).nbytes
4 000 000

>>> # Python list
>>> import sys
>>> import random
>>> l = list(range(1000000))
>>> random.shuffle(l)
>>> size = sys.getsizeof(l)                         # size of the list
>>> size += sum(sys.getsizeof(item) for item in l)  # size of the list elements
>>> size
37 000 108

You only want unique values and you have a consecutive range (1 million requested items and 1 million different numbers), so you could simply shuffle the range and then yield items from your shuffled array:

def generate_random_integer():
    arr = np.arange(1000000, dtype=np.int32)
    np.random.shuffle(arr)
    yield from arr 
    # yield from is equivalent to:
    # for item in arr:     
    #     yield item

And it can be called using next:

>>> gen = generate_random_integer()
>>> next(gen)
443727

However that will throw away the performance benefit of using NumPy, so in case you want to use NumPy don't bother with the generator and just perform the operations (vectorized - if possible) on the array. It consumes much less memory than Python and it could be orders of magnitude faster (factors of 10-100 faster are not uncommon!).

查看更多
该账号已被封号
7楼-- · 2019-04-14 18:40

You can easily make one yourself:

from random import random

def randgen():
    while True:
        yield random()


ran = randgen()
next(ran)  
next(ran)
...
查看更多
登录 后发表回答