Given a tuple list a
:
a =[(23, 11), (10, 16), (13, 11), (12, 3), (4, 15), (10, 16), (10, 16)]
We can count how many appearances of each tuple we have using Counter
:
>>> from collections import Counter
>>> b = Counter(a)
>>> b
Counter({(4, 15): 1, (10, 16): 3, (12, 3): 1, (13, 11): 1, (23, 11): 1}
Now, the idea is to select 3 random tuples from the list, without repetition, such that the count determines the probability that a particular tuple is chosen.
For instance, (10, 16)
is more likely to be chosen than the others - its weight is 3/7 while the other four tuples have weight 1/7.
I have tried to use np.random.choice
:
a[np.random.choice(len(a), 3, p=b/len(a))]
But I'm not able to generate the tuples.
Im trying:
a =[(23, 11), (10, 16), (13, 11), (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(a)
c = []
print "counter list"
print b
for item in b:
print "item from current list"
print item
print "prob of the item"
print (float(b[item])/float(len(a)))
c.append(float(b[item])/float(len(a)))
print "prob list"
print c
print (np.random.choice(np.arange(len(b)), 3, p=c, replace=False))
In this case im getting the random indexes of the array.
If you aren't interested in the intermediate step of calculating frequencies, you could use random.shuffle (either on the list or a copy) and then slice off as many items as you need.
e.g.
import random
a =[(23, 11), (10, 16), (13, 11), (12, 3), (4, 15), (10, 16), (10, 16)]
random.shuffle(a)
random_sample = a[0:3]
print(random_sample)
As shuffle reorders in place it will avoid the repetition issue, and statistically should give the same result (excluding differences in random number generation between np and random).
This will do the trick
from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
import random
listOfNumbers =[(23, 11), (10, 16), (13, 11), (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(listOfNumbers)
c = []
pres=[]
for k,v in b.most_common():
c.append(float(v)/float(len(listOfNumbers)))
pres.append(k)
resultIndex = np.random.choice(np.arange(len(b)), 3, p=c, replace=False)
ass=[]
for res in resultIndex:
ass.append(pres[res])
print ass
Now is just to see if is there any way to optimize it.
You can repeat the following steps 3 times:
- Randomly chose a number
i
in range [0..n-1]
where n
is a current number of elements in a
.
- Find a
tuple
on i
-th position in the initial a
list. Add tuple
to resulting triplet.
- Remove all occurrences of
tuple
from a
.
Pay attention to a corner case when a
can be empty.
The overall time complexity will be O(n) for a list.
On the first step number i
should be generated according to uniform distribution which regular random
provides. The more occurrences of a particular tuple are in a
the more likely it will be chosen.