I am looking to enumerate a random permutation of the numbers 1..N in fixed space. This means that I cannot store all numbers in a list. The reason for that is that N can be very large, more than available memory. I still want to be able to walk through such a permutation of numbers one at a time, visiting each number exactly once.
I know this can be done for certain N: Many random number generators cycle through their whole state space randomly, but entirely. A good random number generator with state size of 32 bit will emit a permutation of the numbers 0..(2^32)-1. Every number exactly once.
I want to get to pick N to be any number at all and not be constrained to powers of 2 for example. Is there an algorithm for this?
Another way to do this is with a block cipher; see this blog post for details.
The blog posts links to the paper Ciphers with Arbitrary Finite Domains which contains a bunch of solutions.
One way to do it would be
p
larger thanN
, preferably not much larger.g
modulop
, that is, a number1 < g < p
such thatg^k ≡ 1 (mod p)
if and only ifk
is a multiple ofp-1
.g^k (mod p)
fork = 1, 2, ...
, ignoring the values that are larger thanN
.For every prime
p
, there areφ(p-1)
primitive roots of unity, so it works. However, it may take a while to find one. Finding a suitable prime is much easier in general.For finding a primitive root, I know nothing substantially better than trial and error, but one can increase the probability of a fast find by choosing the prime
p
appropriately.Since the number of primitive roots is
φ(p-1)
, if one randomly choosesr
in the range from 1 top-1
, the expected number of tries until one finds a primitive root is(p-1)/φ(p-1)
, hence one should choosep
so thatφ(p-1)
is relatively large, that means thatp-1
must have few distinct prime divisors (and preferably only large ones, except for the factor 2).Instead of randomly choosing, one can also try in sequence whether
2, 3, 5, 6, 7, 10, ...
is a primitive root, of course skipping perfect powers (or not, they are in general quickly eliminated), that should not affect the number of tries needed greatly.So it boils down to checking whether a number
x
is a primitive root modulop
. Ifp-1 = q^a * r^b * s^c * ...
with distinct primesq, r, s, ...
,x
is a primitive root if and only ifthus one needs a decent modular exponentiation (exponentiation by repeated squaring lends itself well for that, reducing by the modulus on each step). And a good method to find the prime factor decomposition of
p-1
. Note, however, that even naive trial division would be only O(√p), while the generation of the permutation is Θ(p), so it's not paramount that the factorisation is optimal.The easiest way is probably to just create a full-range PRNG for a larger range than you care about, and when it generates a number larger than you want, just throw it away and get the next one.
Another possibility that's pretty much a variation of the same would be to use a linear feedback shift register (LFSR) to generate the numbers in the first place. This has a couple of advantages: first of all, an LFSR is probably a bit faster than most PRNGs. Second, it is (I believe) a bit easier to engineer an LFSR that produces numbers close to the range you want, and still be sure it cycles through the numbers in its range in (pseudo)random order, without any repetitions.
Without spending a lot of time on the details, the math behind LFSRs has been studied quite thoroughly. Producing one that runs through all the numbers in its range without repetition simply requires choosing a set of "taps" that correspond to an irreducible polynomial. If you don't want to search for that yourself, it's pretty easy to find tables of known ones for almost any reasonable size (e.g., doing a quick look, the wikipedia article lists them for size up to 19 bits).
If memory serves, there's at least one irreducible polynomial of ever possible bit size. That translates to the fact that in the worst case you can create a generator that has roughly twice the range you need, so on average you're throwing away (roughly) every other number you generate. Given the speed an LFSR, I'd guess you can do that and still maintain quite acceptable speed.
Here's some SageMath code that should generate a random permutation the way Daniel Fischer suggested:
The fundamental weakness of LCGs (
x=(x*m+c)%b
style generators) is useful here.If the generator is properly formed then
x%f
is also a repeating sequence of all values lower thanf
(providedf
if a factor ofb
).Since
b
is usually a power of 2 this means that you can take a 32-bit generator and reduce it to an n-bit generator by masking off the top bits and it will have the same full-range property.This means that you can reduce the number of discard values to be fewer than N by choosing an appropriate mask.
Unfortunately LCG Is a poor generator for exactly the same reason as given above.
Also, this has exactly the same weakness as I noted in a comment on @JerryCoffin's answer. It will always produce the same sequence and the only thing the seed controls is where to start in that sequence.
Consider the prime 3. To fully express all possible outputs, think of it this way...
The
bias
is just an offset bias.step
is an accumulator (if it's1
for example, it would just be0, 1, 2
in sequence, while2
would result in0, 2, 4
) andprime
is the prime number we want to generate the permutations against.For example. A simple sequence of
0, 1, 2
would be...Modifying a couple of those variables for a second, we'll take
bias
of1
andstep
of2
(just for illustration)...You'll note that we produced an entirely different sequence. No number within the set repeats itself and all numbers are represented (it's bijective). Each unique combination of offset and bias will result in one of
prime!
possible permutations of the set. In the case of aprime
of3
you'll see that there are6
different possible permuations:If you do the math on the variables above you'll not that it results in the same information requirements...
... vs...
Restrictions are simple,
bias
must be within0..P-1
andstep
must be within1..P-1
(I have been functionally just been using0..P-2
and adding1
on arithmetic in my own work). Other than that, it works with all prime numbers no matter how large and will permutate all possible unique sets of them without the need for memory beyond a couple of integers (each technically requiring slightly less bits than the prime itself).Note carefully that this generator is not meant to be used to generate sets that are not prime in number. It's entirely possible to do so, but not recommended for security sensitive purposes as it would introduce a timing attack.
That said, if you would like to use this method to generate a set sequence that is not a prime, you have two choices.
First (and the simplest/cheapest), pick the prime number just larger than the set size you're looking for and have your generator simply discard anything that doesn't belong. Once more, danger, this is a very bad idea if this is a security sensitive application.
Second (by far the most complicated and costly), you can recognize that all numbers are composed of prime numbers and create multiple generators that then produce a product for each element in the set. In other words, an
n
of6
would involve all possible prime generators that could match6
(in this case,2
and3
), multiplied in sequence. This is both expensive (although mathematically more elegant) as well as also introducing a timing attack so it's even less recommended.Lastly, if you need a generator for
bias
and orstep
... why don't you use another of the same family :). Suddenly you're extremely close to creating true simple-random-samples (which is not easy usually).