Randomly selecting an element from a weighted list

I have a list of 100,000 objects. Every list element has a "weight" associated with it that is a positive int from 1 to N.

What is the most efficient way to select a random element from the list? I want the behavior that my distribution of randomly chosen elements is the same as the distribution of weights in the list.

For example, if I have a list L = {1,1,2,5}, I want the 4th element to be selected 5/9ths of the time, on average.

Assume inserts and deletes are common on this list, so any approach using "integral area tables" would need to be updated often - hoping there is a solution with O(1) runtime and O(1) extra memory required.

标签： algorithm list random statistics

5条回答

不美不萌又怎样

2楼-- · 2020-02-17 10:24

This is what I did to solve it:

def rchoose(list1, weights):
    '''
    list1   :    list of elements you're picking from.
    weights :    list of weights. Has to be in the same order as the 
                 elements of list1. It can be given as the number of counts 
                 or as a probability.
    '''

    import numpy as np

    # normalizing the weights list
    w_sum = sum(weights)
    weights_normalized = []
    for w in weights:
        weights_normalized.append(w/w_sum)

    # sorting the normalized weights and the desired list simultaneously
    weights_normalized, list1 = zip(*sorted(zip(weights_normalized, list1)))

    # bringing the sorted tuples back to being lists
    weights_normalized = list(weights_normalized)
    list1 = list(list1)

    # finalizing the weight normalization
    dummy = []; count = 0
    for item in weights_normalized:
        count += item
        dummy.append(count)
    weights_normalized = dummy

    # testing which interval the uniform random number falls in
    random_number = np.random.uniform(0, 1)
    for idx, w in enumerate(weights_normalized[:-1]):
        if random_number <= w:
            return list1[idx]

    return list1[-1]

0人赞添加讨论(0) 举报

来，给爷笑一个

3楼-- · 2020-02-17 10:38

A solution that runs in O(n) would be to start out with selecting the first element. Then for each following element either keep the element you have or replace it with the next one. Let w be the sum of all weights for elements considered so far. Then keep the old one with probability w/(w+x) and choose the new one with p=x/(w+x), where x is the weight of the next element.

0人赞添加讨论(0) 举报

叛逆

4楼-- · 2020-02-17 10:43

I really like jonderry's solution but I'm wondering if this problem needs a structure as complex as the augmented binary search tree. What if we kept two arrays, one with the input weights, say a={1,1,2,5} and one with the cumulative weights (very similar idea to jonderry's solution) which would be b={1,2,4,9}. Now generate a random number in [1 9] (say x) and binary search for it in the cumulative sum array. The location i where b[i]<=x and b[i-1]>x is noted and a[i] is returned. So, if the random number were 3, we would get i=3, and a[3]=2 would be returned. This ensures the same complexity as the augmented tree solution with an easier implementation.

0人赞添加讨论(0) 举报

女痞

5楼-- · 2020-02-17 10:44

If you know the sum of weights (in your case, 9) AND you use a random-access data structure (list implies O(n) access time), then it can be done fast:

1) select a random element (O(1)). Since there is 1/num_elems chance for an element to be selected at this step, it allows us to use the num_elems* boost for step 2), thus accelerating the algorithm.

2) compute its expected probability: num_elems * (weight/total_weight)

3) take a random number in range 0..1, and if it's lesser than expected probability, you have the output. If not, repeat from step 1)

0人赞添加讨论(0) 举报

老娘就宠你

6楼-- · 2020-02-17 10:48

You can use an augmented binary search tree to store the elements, along with the sum of the weights in each subtree. This lets you insert and delete elements and weights however you want. Both sampling and updates require O(lg n) time per operation, and space usage is O(n).

Sampling is accomplished by generating a random integer in [1, S], where S is the sum of all weights (S is stored at the root of the tree), and performing binary search using the weight-sums stored for each subtree.

0人赞添加讨论(0) 举报

Randomly selecting an element from a weighted list

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间