可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a program where I'm keeping track of the success of various things using collections.Counter
— each success of a thing increments the corresponding counter:
import collections
scoreboard = collections.Counter()
if test(thing):
scoreboard[thing]+ = 1
Then, for future tests, I want to skew towards things which have generated the most success. Counter.elements()
seemed ideal for this, since it returns the elements (in arbitrary order) repeated a number of times equal to the count. So I figured I could just do:
import random
nextthing=random.choice(scoreboard.elements())
But no, that raises TypeError: object of type 'itertools.chain' has no len(). Okay, so random.choice
can't work with iterators. But, in this case, the length is known (or knowable) — it's sum(scoreboard.values())
.
I know the basic algorithm for iterating through a list of unknown length and fairly picking an element at random, but I suspect that there's something more elegant. What should I be doing here?
回答1:
You can do this rather easily by using itertools.islice
to get the Nth item of an iterable:
>>> import random
>>> import itertools
>>> import collections
>>> c = collections.Counter({'a': 2, 'b': 1})
>>> i = random.randrange(sum(c.values()))
>>> next(itertools.islice(c.elements(), i, None))
'a'
回答2:
You could wrap the iterator in list()
to convert it into a list for random.choice()
:
nextthing = random.choice(list(scoreboard.elements()))
The downside here is that this expands the list in memory, rather than accessing it item-by-item as would normally get with an iterator.
If you wanted to solve this iteratively, this algorithm is probably a good choice.
回答3:
The following will get a random item where the score is the weighting for how often to return that item.
import random
def get_random_item_weighted(scoreboard):
total_scoreboard_value = sum(scoreboard.values())
item_loc = random.random() * total_scoreboard_value
current_loc = 0
for item, score in scoreboard.items():
current_loc += score
if current_loc > item_loc:
return item
for instance, if there are 2 items:
item1 has a score 5
item2 has a score 10
item2 will be returned twice as often as item1
回答4:
Given a dictionary of choices with corresponding relative probabilities (can be the count in your case), you can use the new random.choices
added in Python 3.6 like so:
import random
my_dict = {
"choice a" : 1, # will in this case be chosen 1/3 of the time
"choice b" : 2, # will in this case be chosen 2/3 of the time
}
choice = random.choices(*zip(*my_dict.items()))[0]
回答5:
Another variant,
Setup is a bit cumbersome, but lookup is in logarithmic complexity (suitable when several lookups are needed):
import itertools
import random
from collections import Counter
from bisect import bisect
counter = Counter({"a": 5, "b": 1, "c": 1})
#setup
most_common = counter.most_common()
accumulated = list(itertools.accumulate([x[1] for x in most_common])) # i.e. [5, 6, 7]
total_size = accumulated[-1]
# lookup
i = random.randrange(total_size)
print(most_common[bisect(accumulated, i)])
回答6:
Another variant with iteration:
import collections
from collections import Counter
import random
class CounterElementsRandomAccess(collections.Sequence):
def __init__(self, counter):
self._counter = counter
def __len__(self):
return sum(self._counter.values())
def __getitem__(self, item):
for i, el in enumerate(self._counter.elements()):
if i == item:
return el
scoreboard = Counter('AAAASDFQWERQWEQWREAAAAABBBBCCDDVBSDF')
score_elements = CounterElementsRandomAccess(scoreboard)
for i in range(10):
print random.choice(score_elements)