I need a number of unique random permutations of a list without replacement, efficiently. My current approach:
total_permutations = math.factorial(len(population))
permutation_indices = random.sample(xrange(total_permutations), k)
k_permutations = [get_nth_permutation(population, x) for x in permutation_indices]
where get_nth_permutation
does exactly what it sounds like, efficiently (meaning O(N)). However, this only works for len(population) <= 20
, simply because 21! is so mindblowingly long that xrange(math.factorial(21))
won't work:
OverflowError: Python int too large to convert to C long
Is there a better algorithm to sample k unique permutations without replacement in O(N)?
Up to a certain point, it's unnecessary to use get_nth_permutation
to get permutations. Just shuffle the list!
>>> import random
>>> l = range(21)
>>> def random_permutations(l, n):
... while n:
... random.shuffle(l)
... yield list(l)
... n -= 1
...
>>> list(random_permutations(l, 5))
[[11, 19, 6, 10, 0, 3, 12, 7, 8, 16, 15, 5, 14, 9, 20, 2, 1, 13, 17, 18, 4],
[14, 8, 12, 3, 5, 20, 19, 13, 6, 18, 9, 16, 2, 10, 4, 1, 17, 15, 0, 7, 11],
[7, 20, 3, 8, 18, 17, 4, 11, 15, 6, 16, 1, 14, 0, 13, 5, 10, 9, 2, 19, 12],
[10, 14, 5, 17, 8, 15, 13, 0, 3, 16, 20, 18, 19, 11, 2, 9, 6, 12, 7, 4, 1],
[1, 13, 15, 18, 16, 6, 19, 8, 11, 12, 10, 20, 3, 4, 17, 0, 9, 5, 2, 7, 14]]
The odds are overwhelmingly against duplicates appearing in this list for len(l)
> 15 and n
< 100000, but if you need guarantees, or for lower values of len(l)
, just use a set
to record and skip duplicates if that's a concern (though as you've observed in your comments, if n
gets close to len(l)!
, this will stall). Something like:
def random_permutations(l, n):
pset = set()
while len(pset) < n:
random.shuffle(l)
pset.add(tuple(l))
return pset
However, as len(l)
gets longer and longer, random.shuffle
becomes less reliable, because the number of possible permutations of the list increases beyond the period of the random number generator! So not all permutations of l
can be generated that way. At that point, not only do you need to map get_nth_permutation
over a sequence of random numbers, you also need a random number generator capable of producing every random number between 0
and len(l)
! with relatively uniform distribution. That might require you to find a more robust source of randomness.
However, once you have that, the solution is as simple as Mark Ransom's answer.
To understand why random.shuffle
becomes unreliable for large len(l)
, consider the following. random.shuffle
only needs to pick random numbers between 0
and len(l) - 1
. But it picks those numbers based on its internal state, and it can take only a finite (and fixed) number of states. Likewise, the number of possible seed values you can pass to it is finite. This means that the set of unique sequences of numbers it can generate is also finite; call that set s
. For len(l)! > len(s)
, some permutations can never be generated, because the sequences that correspond to those permutations aren't in s
.
What are the exact lengths at which this becomes a problem? I'm not sure. But for what it's worth, the period of the mersenne twister, as implemented by random
, is 2**19937-1. The shuffle docs reiterate my point in a general way; see also what Wikipedia has to say on the matter here.
I had one implementation of nth_permutation (not sure from where I got it) which I modified for your purpose. I believe this would be fast enough to suit your need
>>> def get_nth_permutation(population):
total_permutations = math.factorial(len(population))
while True:
temp_population = population[:]
n = random.randint(1,total_permutations)
size = len(temp_population)
def generate(s,n,population):
for x in range(s-1,-1,-1):
fact = math.factorial(x)
d = n/fact
n -= d * fact
yield temp_population[d]
temp_population.pop(d)
next_perm = generate(size,n,population)
yield [e for e in next_perm]
>>> nth_perm = get_nth_permutation(range(21))
>>> [next(nth_perm) for k in range(1,10)]