Create a random permutation of 1..N in constant sp

2019-01-08 23:42发布

站内文章 / 后端开发

8 0

太酷不给撩

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am looking to enumerate a random permutation of the numbers 1..N in fixed space. This means that I cannot store all numbers in a list. The reason for that is that N can be very large, more than available memory. I still want to be able to walk through such a permutation of numbers one at a time, visiting each number exactly once.

I know this can be done for certain N: Many random number generators cycle through their whole state space randomly, but entirely. A good random number generator with state size of 32 bit will emit a permutation of the numbers 0..(2^32)-1. Every number exactly once.

I want to get to pick N to be any number at all and not be constrained to powers of 2 for example. Is there an algorithm for this?

回答1:

The easiest way is probably to just create a full-range PRNG for a larger range than you care about, and when it generates a number larger than you want, just throw it away and get the next one.

Another possibility that's pretty much a variation of the same would be to use a linear feedback shift register (LFSR) to generate the numbers in the first place. This has a couple of advantages: first of all, an LFSR is probably a bit faster than most PRNGs. Second, it is (I believe) a bit easier to engineer an LFSR that produces numbers close to the range you want, and still be sure it cycles through the numbers in its range in (pseudo)random order, without any repetitions.

Without spending a lot of time on the details, the math behind LFSRs has been studied quite thoroughly. Producing one that runs through all the numbers in its range without repetition simply requires choosing a set of "taps" that correspond to an irreducible polynomial. If you don't want to search for that yourself, it's pretty easy to find tables of known ones for almost any reasonable size (e.g., doing a quick look, the wikipedia article lists them for size up to 19 bits).

If memory serves, there's at least one irreducible polynomial of ever possible bit size. That translates to the fact that in the worst case you can create a generator that has roughly twice the range you need, so on average you're throwing away (roughly) every other number you generate. Given the speed an LFSR, I'd guess you can do that and still maintain quite acceptable speed.

回答2:

One way to do it would be

Find a prime p larger than N, preferably not much larger.
Find a primitive root of unity g modulo p, that is, a number 1 < g < p such that g^k ≡ 1 (mod p) if and only if k is a multiple of p-1.
Go through g^k (mod p) for k = 1, 2, ..., ignoring the values that are larger than N.

For every prime p, there are φ(p-1) primitive roots of unity, so it works. However, it may take a while to find one. Finding a suitable prime is much easier in general.

For finding a primitive root, I know nothing substantially better than trial and error, but one can increase the probability of a fast find by choosing the prime p appropriately.

Since the number of primitive roots is φ(p-1), if one randomly chooses r in the range from 1 to p-1, the expected number of tries until one finds a primitive root is (p-1)/φ(p-1), hence one should choose p so that φ(p-1) is relatively large, that means that p-1 must have few distinct prime divisors (and preferably only large ones, except for the factor 2).

Instead of randomly choosing, one can also try in sequence whether 2, 3, 5, 6, 7, 10, ... is a primitive root, of course skipping perfect powers (or not, they are in general quickly eliminated), that should not affect the number of tries needed greatly.

So it boils down to checking whether a number x is a primitive root modulo p. If p-1 = q^a * r^b * s^c * ... with distinct primes q, r, s, ..., x is a primitive root if and only if

x^((p-1)/q) % p != 1
x^((p-1)/r) % p != 1
x^((p-1)/s) % p != 1
...

thus one needs a decent modular exponentiation (exponentiation by repeated squaring lends itself well for that, reducing by the modulus on each step). And a good method to find the prime factor decomposition of p-1. Note, however, that even naive trial division would be only O(√p), while the generation of the permutation is Θ(p), so it's not paramount that the factorisation is optimal.

回答3:

Another way to do this is with a block cipher; see this blog post for details.

The blog posts links to the paper Ciphers with Arbitrary Finite Domains which contains a bunch of solutions.

回答4:

Consider the prime 3. To fully express all possible outputs, think of it this way...

bias + step mod prime

The bias is just an offset bias. step is an accumulator (if it's 1 for example, it would just be 0, 1, 2 in sequence, while 2 would result in 0, 2, 4) and prime is the prime number we want to generate the permutations against.

For example. A simple sequence of 0, 1, 2 would be...

0 + 0 mod 3 = 0
0 + 1 mod 3 = 1
0 + 2 mod 3 = 2

Modifying a couple of those variables for a second, we'll take bias of 1 and step of 2 (just for illustration)...

1 + 2 mod 3 = 0
1 + 4 mod 3 = 2
1 + 6 mod 3 = 1

You'll note that we produced an entirely different sequence. No number within the set repeats itself and all numbers are represented (it's bijective). Each unique combination of offset and bias will result in one of prime! possible permutations of the set. In the case of a prime of 3 you'll see that there are 6 different possible permuations:

0,1,2
0,2,1
1,0,2
1,2,0
2,0,1
2,1,0

If you do the math on the variables above you'll not that it results in the same information requirements...

1/3! = 1/6 = 1.66..

... vs...

1/3 (bias) * 1/2 (step) => 1/6 = 1.66..

Restrictions are simple, bias must be within 0..P-1 and step must be within 1..P-1 (I have been functionally just been using 0..P-2 and adding 1 on arithmetic in my own work). Other than that, it works with all prime numbers no matter how large and will permutate all possible unique sets of them without the need for memory beyond a couple of integers (each technically requiring slightly less bits than the prime itself).

Note carefully that this generator is not meant to be used to generate sets that are not prime in number. It's entirely possible to do so, but not recommended for security sensitive purposes as it would introduce a timing attack.

That said, if you would like to use this method to generate a set sequence that is not a prime, you have two choices.

First (and the simplest/cheapest), pick the prime number just larger than the set size you're looking for and have your generator simply discard anything that doesn't belong. Once more, danger, this is a very bad idea if this is a security sensitive application.

Second (by far the most complicated and costly), you can recognize that all numbers are composed of prime numbers and create multiple generators that then produce a product for each element in the set. In other words, an n of 6 would involve all possible prime generators that could match 6 (in this case, 2 and 3), multiplied in sequence. This is both expensive (although mathematically more elegant) as well as also introducing a timing attack so it's even less recommended.

Lastly, if you need a generator for bias and or step... why don't you use another of the same family :). Suddenly you're extremely close to creating true simple-random-samples (which is not easy usually).

回答5:

The fundamental weakness of LCGs (x=(x*m+c)%b style generators) is useful here.

If the generator is properly formed then x%f is also a repeating sequence of all values lower than f (provided f if a factor of b).

Since bis usually a power of 2 this means that you can take a 32-bit generator and reduce it to an n-bit generator by masking off the top bits and it will have the same full-range property.

This means that you can reduce the number of discard values to be fewer than N by choosing an appropriate mask.

Unfortunately LCG Is a poor generator for exactly the same reason as given above.

Also, this has exactly the same weakness as I noted in a comment on @JerryCoffin's answer. It will always produce the same sequence and the only thing the seed controls is where to start in that sequence.

回答6:

Here's some SageMath code that should generate a random permutation the way Daniel Fischer suggested:

def random_safe_prime(lbound):
    while True:
        q = random_prime(lbound, lbound=lbound // 2)
        p = 2 * q + 1
        if is_prime(p):
            return p, q


def random_permutation(n):
    p, q = random_safe_prime(n + 2)

    while True:
        r = randint(2, p - 1)
        if pow(r, 2, p) != 1 and pow(r, q, p) != 1:
            i = 1
            while True:
                x = pow(r, i, p)
                if x == 1:
                    return

                if 0 <= x - 2 < n:
                    yield x - 2

                i += 1