How do I use random.shuffle() on a generator without initializing a list from the generator?
Is that even possible? if not, how else should I use random.shuffle()
on my list?
>>> import random
>>> random.seed(2)
>>> x = [1,2,3,4,5,6,7,8,9]
>>> def yielding(ls):
... for i in ls:
... yield i
...
>>> for i in random.shuffle(yielding(x)):
... print i
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/random.py", line 287, in shuffle
for i in reversed(xrange(1, len(x))):
TypeError: object of type 'generator' has no len()
Note: random.seed()
was designed such that it returns the same output after each script run?
In order to shuffle the sequence uniformly,
random.shuffle()
needs to know how long the input is. A generator cannot provide this; you have to materialize it into a list:You could, instead, use
sorted()
withrandom.random()
as the key:but since this also produces a list, there is little point in going this route.
Demo:
Depending on the case, if you know how much data you have ahead of time, you can index the data and compute/read from it based on a shuffled index. This amounts to: 'don't use a generator for this problem', and without specific use-cases it's hard to come up with a general method.
Alternatively... If you need to use the generator...
it depends on 'how shuffled' you want the data. Of course, like folks have pointed out, generators don't have a length, so you need to at some point evaluate the generator, which could be expensive. If you don't need perfect randomness, you can introduce a shuffle buffer:
This will shuffle data in chunks of
buffer_size
, so you can avoid memory issues if that is your limiting factor. Of course, this is not a truly random shuffle, so it shouldn't be used on something that's sorted, but if you just need to add some randomness to your data this may be a good solution.I needed to find a solution to this problem so I could get expensive to compute elements in a shuffled order, without wasting computation by generating values. This is what I have come up with for your example. It involves making another function to index the first array.
You will need numpy installed
The Code:
It's not possible to randomize the yield of a generator without temporarily saving all the elements somewhere. Luckily, this is pretty easy in Python:
Note the call to
list()
which will read all items and put them into a list.If you don't want to or can't store all elements, you will need to change the generator to yield in a random order.