Generate a list of random weighted tuples from a l

2019-06-14 16:43发布

问题:

Given a tuple list a:

a =[(23, 11), (10, 16), (13, 11),  (12, 3), (4, 15), (10, 16), (10, 16)]

We can count how many appearances of each tuple we have using Counter:

>>> from collections import Counter
>>> b = Counter(a)
>>> b
Counter({(4, 15): 1, (10, 16): 3, (12, 3): 1, (13, 11): 1, (23, 11): 1}

Now, the idea is to select 3 random tuples from the list, without repetition, such that the count determines the probability that a particular tuple is chosen.

For instance, (10, 16) is more likely to be chosen than the others - its weight is 3/7 while the other four tuples have weight 1/7.

I have tried to use np.random.choice:

a[np.random.choice(len(a), 3, p=b/len(a))]

But I'm not able to generate the tuples.

Im trying:

a =[(23, 11), (10, 16), (13, 11),  (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(a)
c = []
print "counter list"
print b
for item in b:
    print "item from current list"
    print item
    print "prob of the item"
    print (float(b[item])/float(len(a)))

    c.append(float(b[item])/float(len(a)))

print "prob list"
print c

print (np.random.choice(np.arange(len(b)), 3, p=c, replace=False))

In this case im getting the random indexes of the array.

  • Is there any more optimized way not to have to calculate the probabilities array?

  • Also there is an issue that is that the prob array does not correspond to the Counter array.

回答1:

If you aren't interested in the intermediate step of calculating frequencies, you could use random.shuffle (either on the list or a copy) and then slice off as many items as you need.

e.g.

import random
a =[(23, 11), (10, 16), (13, 11),  (12, 3), (4, 15), (10, 16), (10, 16)]
random.shuffle(a)
random_sample = a[0:3]
print(random_sample)

As shuffle reorders in place it will avoid the repetition issue, and statistically should give the same result (excluding differences in random number generation between np and random).



回答2:

This will do the trick

from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
import random

listOfNumbers =[(23, 11), (10, 16), (13, 11),  (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(listOfNumbers)
c = []
pres=[]
for k,v in b.most_common():
    c.append(float(v)/float(len(listOfNumbers)))
    pres.append(k)

resultIndex = np.random.choice(np.arange(len(b)), 3, p=c, replace=False)

ass=[]
for res in resultIndex:
    ass.append(pres[res])

print ass

Now is just to see if is there any way to optimize it.



回答3:

You can repeat the following steps 3 times:

  1. Randomly chose a number i in range [0..n-1] where n is a current number of elements in a.
  2. Find a tuple on i-th position in the initial a list. Add tuple to resulting triplet.
  3. Remove all occurrences of tuple from a.

Pay attention to a corner case when a can be empty.

The overall time complexity will be O(n) for a list.

On the first step number i should be generated according to uniform distribution which regular random provides. The more occurrences of a particular tuple are in a the more likely it will be chosen.