可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
How do I use random.shuffle() on a generator without initializing a list from the generator?
Is that even possible? if not, how else should I use random.shuffle()
on my list?
>>> import random
>>> random.seed(2)
>>> x = [1,2,3,4,5,6,7,8,9]
>>> def yielding(ls):
... for i in ls:
... yield i
...
>>> for i in random.shuffle(yielding(x)):
... print i
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/random.py", line 287, in shuffle
for i in reversed(xrange(1, len(x))):
TypeError: object of type 'generator' has no len()
Note: random.seed()
was designed such that it returns the same output after each script run?
回答1:
In order to shuffle the sequence uniformly, random.shuffle()
needs to know how long the input is. A generator cannot provide this; you have to materialize it into a list:
lst = list(yielding(x))
random.shuffle(lst)
for i in lst:
print i
You could, instead, use sorted()
with random.random()
as the key:
for i in sorted(yielding(x), key=lambda k: random.random()):
print i
but since this also produces a list, there is little point in going this route.
Demo:
>>> import random
>>> x = [1,2,3,4,5,6,7,8,9]
>>> sorted(iter(x), key=lambda k: random.random())
[9, 7, 3, 2, 5, 4, 6, 1, 8]
回答2:
It's not possible to randomize the yield of a generator without temporarily saving all the elements somewhere. Luckily, this is pretty easy in Python:
tmp = list(yielding(x))
random.shuffle(tmp)
for i in tmp:
print i
Note the call to list()
which will read all items and put them into a list.
If you don't want to or can't store all elements, you will need to change the generator to yield in a random order.
回答3:
I needed to find a solution to this problem so I could get expensive to compute elements in a shuffled order, without wasting computation by generating values. This is what I have come up with for your example. It involves making another function to index the first array.
You will need numpy installed
pip install numpy
The Code:
import numpy as np
x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
def shuffle_generator(lst):
return (lst[idx] for idx in np.random.permutation(len(lst)))
def yielding(ls):
for i in ls:
yield i
# for i in random.shuffle(yielding(x)):
# print i
for i in yielding(shuffle_generator(x)):
print(i)
回答4:
Depending on the case, if you know how much data you have ahead of time, you can index the data and compute/read from it based on a shuffled index. This amounts to: 'don't use a generator for this problem', and without specific use-cases it's hard to come up with a general method.
Alternatively... If you need to use the generator...
it depends on 'how shuffled' you want the data. Of course, like folks have pointed out, generators don't have a length, so you need to at some point evaluate the generator, which could be expensive. If you don't need perfect randomness, you can introduce a shuffle buffer:
from itertools import islice
import numpy as np
def shuffle(generator, buffer_size):
while True:
buffer = list(islice(generator, buffer_size))
if len(buffer) == 0:
break
np.random.shuffle(buffer)
for item in buffer:
yield item
shuffled_generator = shuffle(my_generator, 256)
This will shuffle data in chunks of buffer_size
, so you can avoid memory issues if that is your limiting factor. Of course, this is not a truly random shuffle, so it shouldn't be used on something that's sorted, but if you just need to add some randomness to your data this may be a good solution.