Does Python have a random number generator that returns only one random integer number each time when next()
function is called? Numbers should not repeat and the generator should return random integers in the interval [1, 1 000 000]
that are unique.
I need to generate more than million different numbers and that sounds as if it is very memory consuming in case all the number are generated at same time and stored in a list.
You are looking for a linear congruential generator with a full period. This will allow you to get a pseudo-random sequence of non-repeating numbers in your target number range.
Implementing a LCG is actually very simple, and looks like this:
Then, it just comes down to choosing the correct values for
a
,c
, andm
to guarantee that the LCG will generate a full period (which is the only guarantee that you get non-repeating numbers). As the Wikipedia article explains, the following three conditions need to be true:m
andc
need to be relatively prime.a - 1
is divisible by all prime factors ofm
a - 1
is divisible by 4, ifm
is also divisible by 4.The first one is very easily guaranteed by simply choosing a prime for
c
. Also, this is the value that can be chosen last, and this will ultimately allow us to mix up the sequence a bit.The relationship between
a - 1
andm
is more complicated though. In a full period LCG,m
is the length of the period. Or in other words, it is the number range your numbers come from. So this is what you are usually choosing first. In your case, you wantm
to be around1000000
. Choosing exactly your maximum number might be difficult since that restricts you a lot (in both your choice ofa
and alsoc
), so you can also choose numbers larger than that and simply skip all numbers outside of your range later.Let’s choose
m = 1000000
now though. The prime factors ofm
are2
and5
. And it’s also obviously divisible by4
. So fora - 1
, we need a number that is a multiple of2 * 2 * 5
to satisfy the conditions 2 and 3. Let’s choosea - 1 = 160
, soa = 161
.For
c
, we are using a random prime that’s somewhere in between of our range:c = 506903
Putting that into our LCG gives us our desired sequence. We can choose any seed value from the range (
0 <= seed <= m
) as the starting point of our sequence.So let’s try it out and verify that what we thought of actually works. For this purpose, we are just collecting all numbers from the generator in a set until we hit a duplicate. At that point, we should have
m = 1000000
numbers in the set:And it’s correct! So we did create a pseudo-random sequence of numbers that allowed us to get non-repeating numbers from our range
m
. Of course, by design, this sequence will be always the same, so it is only random once when you choose those numbers. You can switch up the values fora
andc
to get different sequences though, as long as you maintain the properties mentioned above.The big benefit of this approach is of course that you do not need to store all the previously generated numbers. It is a constant space algorithm as it only needs to remember the initial configuration and the previously generated value.
It will also not deteriorate as you get further into the sequence. This is a general problem with solutions that just keep generating a random number until a new one is found that hasn’t been encountered before. This is because the longer the list of generated numbers gets, the less likely you are going to hit a numbers that’s not in that list with an evenly distributed random algorithm. So getting the 1000000th number will likely take you a long time to generate with memory based random generators.
But of course, having this simply algorithm which just performs some multiplication and some addition does not appear very random. But you have to keep in mind that this is actually the basis for most pseudo-random number generators out there. So
random.random()
uses something like this internally. It’s just that them
is a lot larger, so you don’t notice it there.Considering your numbers should fit in a 64bit integer, one million of them stored in a list would be up to 64 mega bytes plus the list object overhead, if your processing computer can afford that the easyest way is to use shuffle:
Note that the other method is to keep track of the previously generated numbers, which will get you to the point of having all of them stored too.
For a large number of non-repeating random numbers use an encryption. With a given key, encrypt the numbers: 0, 1, 2, 3, ... Since encryption is uniquely reversible then each encrypted number is guaranteed to be unique, provided you use the same key. For 64 bit numbers use DES. For 128 bit numbers use AES. For other size numbers use some Format Preserving Encryption. For pure numbers you might find Hasty Pudding cipher useful as that allows a large range of different bit sizes and non-bit sizes as well, like [0..5999999].
Keep track of the key and the last number you encrypted. When you need a new unique random number just encrypt the next number you haven't used so far.
This way you are sure you have perfectly random unique values
x
represents the number of values you wantIf you really care about the memory you could use a
NumPy
array (or a Pythonarray
).A one million NumPy array of
int32
(more than enough to contain integers between 0 and 1 000 000) will only consume ~4MB, Python itself would require ~36MB (roughly 28byte per integer and 8 byte for each list element + overallocation) for an identical list:You only want unique values and you have a consecutive range (1 million requested items and 1 million different numbers), so you could simply shuffle the range and then yield items from your shuffled array:
And it can be called using
next
:However that will throw away the performance benefit of using NumPy, so in case you want to use NumPy don't bother with the generator and just perform the operations (vectorized - if possible) on the array. It consumes much less memory than Python and it could be orders of magnitude faster (factors of 10-100 faster are not uncommon!).
You can easily make one yourself: