Sample k random permutations without replacement i

2019-05-21 02:22发布

问题:

I need a number of unique random permutations of a list without replacement, efficiently. My current approach:

total_permutations = math.factorial(len(population))
permutation_indices = random.sample(xrange(total_permutations), k)
k_permutations = [get_nth_permutation(population, x) for x in permutation_indices]

where get_nth_permutation does exactly what it sounds like, efficiently (meaning O(N)). However, this only works for len(population) <= 20, simply because 21! is so mindblowingly long that xrange(math.factorial(21)) won't work:

OverflowError: Python int too large to convert to C long

Is there a better algorithm to sample k unique permutations without replacement in O(N)?

回答1:

Instead of using xrange simply keep generating random numbers until you have as many as you need. Using a set makes sure they're all unique.

permutation_indices = set()
while len(permutation_indices) < k:
    permutation_indices.add(random.randrange(total_permutations))


回答2:

Up to a certain point, it's unnecessary to use get_nth_permutation to get permutations. Just shuffle the list!

>>> import random
>>> l = range(21)
>>> def random_permutations(l, n):
...     while n:
...         random.shuffle(l)
...         yield list(l)
...         n -= 1
... 
>>> list(random_permutations(l, 5))
[[11, 19, 6, 10, 0, 3, 12, 7, 8, 16, 15, 5, 14, 9, 20, 2, 1, 13, 17, 18, 4], 
 [14, 8, 12, 3, 5, 20, 19, 13, 6, 18, 9, 16, 2, 10, 4, 1, 17, 15, 0, 7, 11], 
 [7, 20, 3, 8, 18, 17, 4, 11, 15, 6, 16, 1, 14, 0, 13, 5, 10, 9, 2, 19, 12], 
 [10, 14, 5, 17, 8, 15, 13, 0, 3, 16, 20, 18, 19, 11, 2, 9, 6, 12, 7, 4, 1], 
 [1, 13, 15, 18, 16, 6, 19, 8, 11, 12, 10, 20, 3, 4, 17, 0, 9, 5, 2, 7, 14]]

The odds are overwhelmingly against duplicates appearing in this list for len(l) > 15 and n < 100000, but if you need guarantees, or for lower values of len(l), just use a set to record and skip duplicates if that's a concern (though as you've observed in your comments, if n gets close to len(l)!, this will stall). Something like:

def random_permutations(l, n):    
    pset = set()
    while len(pset) < n:
        random.shuffle(l)
        pset.add(tuple(l))
    return pset

However, as len(l) gets longer and longer, random.shuffle becomes less reliable, because the number of possible permutations of the list increases beyond the period of the random number generator! So not all permutations of l can be generated that way. At that point, not only do you need to map get_nth_permutation over a sequence of random numbers, you also need a random number generator capable of producing every random number between 0 and len(l)! with relatively uniform distribution. That might require you to find a more robust source of randomness.

However, once you have that, the solution is as simple as Mark Ransom's answer.

To understand why random.shuffle becomes unreliable for large len(l), consider the following. random.shuffle only needs to pick random numbers between 0 and len(l) - 1. But it picks those numbers based on its internal state, and it can take only a finite (and fixed) number of states. Likewise, the number of possible seed values you can pass to it is finite. This means that the set of unique sequences of numbers it can generate is also finite; call that set s. For len(l)! > len(s), some permutations can never be generated, because the sequences that correspond to those permutations aren't in s.

What are the exact lengths at which this becomes a problem? I'm not sure. But for what it's worth, the period of the mersenne twister, as implemented by random, is 2**19937-1. The shuffle docs reiterate my point in a general way; see also what Wikipedia has to say on the matter here.



回答3:

I had one implementation of nth_permutation (not sure from where I got it) which I modified for your purpose. I believe this would be fast enough to suit your need

>>> def get_nth_permutation(population):
    total_permutations = math.factorial(len(population))

    while True:
        temp_population = population[:]
        n = random.randint(1,total_permutations)
        size = len(temp_population)
        def generate(s,n,population):
            for x in range(s-1,-1,-1):
                fact = math.factorial(x)
                d = n/fact
                n -= d * fact
                yield temp_population[d]
                temp_population.pop(d)
        next_perm = generate(size,n,population)
        yield [e for e in next_perm]


>>> nth_perm = get_nth_permutation(range(21))
>>> [next(nth_perm) for k in range(1,10)]


回答4:

You seem to be searching for the Knuth Shuffle! Good luck!



回答5:

You could use itertools.islice instead of xrange():

CPython implementation detail: xrange() is intended to be simple and fast Implementations may impose restrictions to achieve this. The C implementation of Python restricts all arguments to native C longs (“short” Python integers), and also requires that the number of elements fit in a native C long. If a larger range is needed, an alternate version can be crafted using the itertools module: islice(count(start, step), (stop-start+step-1+2*(step<0))//step).