Weighted random sample without replacement in pyth

2019-06-27 03:52发布

问题:

I need to obtain a k-sized sample without replacement from a population, where each member of the population has a associated weight (W).

Numpy's random.choices will not perform this task without replacement, and random.sample won't take a weighted input.

Currently, this is what I am using:

P = np.zeros((1,Parent_number))
n=0
while n < Parent_number:
    draw = random.choices(population,weights=W,k=1)
    if draw not in P:
        P[0,n] = draw[0]
        n=n+1
P=np.asarray(sorted(P[0])) 

While this works, it reqires switching back and forth from arrays, to lists and back to arrays and is, therefore, less than ideal.

I am looking for the simplest and easiest to understand solution as this code will be shared with others.

回答1:

You can use np.random.choice with replace=False as follows:

np.random.choice(vec,size,replace=False, p=P)

where vec is your population and P is the weight vector.

For example:

import numpy as np
vec=[1,2,3]
P=[0.5,0.2,0.3]
np.random.choice(vec,size=2,replace=False, p=P)


回答2:

For numpy, Miriam Farber's answer is the way to go.

For pure python, the technique is to pre-weight the population and then use random.sample() to extract the values without replacement:

>>> # Extract 10 values without replacement from a population
>>> # of ten heads and four tails.
>>> from random import sample
>>> population = ['heads', 'tails']
>>> counts = [10, 4]
>>> weighted_pop = [elem for elem, cnt in zip(population, counts) for i in range(cnt)]
>>> sample(weighted_pop, k=10)
['heads', 'tails', 'tails', 'heads', 'heads', 'tails', 'heads', 'heads', 'heads', 'heads']

Note, the weights are really counts. This is important because when you sample without replacement, the count needs to be reduced by one for each selection.