Generate random numbers with a given (numerical) d-第2页回答

I have a file with some probabilities for different values e.g.:

I would like to generate random numbers using this distribution. Does an existing module that handles this exist? It's fairly simple to code on your own (build the cumulative density function, generate a random value [0,1] and pick the corresponding value) but it seems like this should be a common problem and probably someone has created a function/module for it.

I need this because I want to generate a list of birthdays (which do not follow any distribution in the standard random module).

标签： python module random

12条回答

君临天下

2楼-- · 2019-01-01 09:02

based on other solutions, you generate accumulative distribution (as integer or float whatever you like), then you can use bisect to make it fast

this is a simple example (I used integers here)

l=[(20, 'foo'), (60, 'banana'), (10, 'monkey'), (10, 'monkey2')]
def get_cdf(l):
    ret=[]
    c=0
    for i in l: c+=i[0]; ret.append((c, i[1]))
    return ret

def get_random_item(cdf):
    return cdf[bisect.bisect_left(cdf, (random.randint(0, cdf[-1][0]),))][1]

cdf=get_cdf(l)
for i in range(100): print get_random_item(cdf),

the get_cdf function would convert it from 20, 60, 10, 10 into 20, 20+60, 20+60+10, 20+60+10+10

now we pick a random number up to 20+60+10+10 using random.randint then we use bisect to get the actual value in a fast way

0人赞添加讨论(0) 举报

浅入江南

3楼-- · 2019-01-01 09:02

None of these answers is particularly clear or simple.

Here is a clear, simple method that is guaranteed to work.

accumulate_normalize_probabilities takes a dictionary p that maps symbols to probabilities OR frequencies. It outputs usable list of tuples from which to do selection.

def accumulate_normalize_values(p):
        pi = p.items() if isinstance(p,dict) else p
        accum_pi = []
        accum = 0
        for i in pi:
                accum_pi.append((i[0],i[1]+accum))
                accum += i[1]
        if accum == 0:
                raise Exception( "You are about to explode the universe. Continue ? Y/N " )
        normed_a = []
        for a in accum_pi:
                normed_a.append((a[0],a[1]*1.0/accum))
        return normed_a

Yields:

>>> accumulate_normalize_values( { 'a': 100, 'b' : 300, 'c' : 400, 'd' : 200  } )
[('a', 0.1), ('c', 0.5), ('b', 0.8), ('d', 1.0)]

Why it works

The accumulation step turns each symbol into an interval between itself and the previous symbols probability or frequency (or 0 in the case of the first symbol). These intervals can be used to select from (and thus sample the provided distribution) by simply stepping through the list until the random number in interval 0.0 -> 1.0 (prepared earlier) is less or equal to the current symbol's interval end-point.

The normalization releases us from the need to make sure everything sums to some value. After normalization the "vector" of probabilities sums to 1.0.

The rest of the code for selection and generating a arbitrarily long sample from the distribution is below :

def select(symbol_intervals,random):
        print symbol_intervals,random
        i = 0
        while random > symbol_intervals[i][1]:
                i += 1
                if i >= len(symbol_intervals):
                        raise Exception( "What did you DO to that poor list?" )
        return symbol_intervals[i][0]


def gen_random(alphabet,length,probabilities=None):
        from random import random
        from itertools import repeat
        if probabilities is None:
                probabilities = dict(zip(alphabet,repeat(1.0)))
        elif len(probabilities) > 0 and isinstance(probabilities[0],(int,long,float)):
                probabilities = dict(zip(alphabet,probabilities)) #ordered
        usable_probabilities = accumulate_normalize_values(probabilities)
        gen = []
        while len(gen) < length:
                gen.append(select(usable_probabilities,random()))
        return gen

Usage :

>>> gen_random (['a','b','c','d'],10,[100,300,400,200])
['d', 'b', 'b', 'a', 'c', 'c', 'b', 'c', 'c', 'c']   #<--- some of the time

0人赞添加讨论(0) 举报

孤独总比滥情好

4楼-- · 2019-01-01 09:03

An advantage to generating the list using CDF is that you can use binary search. While you need O(n) time and space for preprocessing, you can get k numbers in O(k log n). Since normal Python lists are inefficient, you can use array module.

If you insist on constant space, you can do the following; O(n) time, O(1) space.

def random_distr(l):
    r = random.uniform(0, 1)
    s = 0
    for item, prob in l:
        s += prob
        if s >= r:
            return item
    return item  # Might occur because of floating point inaccuracies

0人赞添加讨论(0) 举报

其实，你不懂

5楼-- · 2019-01-01 09:13

you might want to have a look at NumPy Random sampling distributions

0人赞添加讨论(0) 举报

骚的不知所云

6楼-- · 2019-01-01 09:14

Make a list of items, based on their weights:

items = [1, 2, 3, 4, 5, 6]
probabilities= [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]
# if the list of probs is normalized (sum(probs) == 1), omit this part
prob = sum(probabilities) # find sum of probs, to normalize them
c = (1.0)/prob # a multiplier to make a list of normalized probs
probabilities = map(lambda x: c*x, probabilities)
print probabilities

ml = max(probabilities, key=lambda x: len(str(x)) - str(x).find('.'))
ml = len(str(ml)) - str(ml).find('.') -1
amounts = [ int(x*(10**ml)) for x in probabilities]
itemsList = list()
for i in range(0, len(items)): # iterate through original items
  itemsList += items[i:i+1]*amounts[i]

# choose from itemsList randomly
print itemsList

An optimization may be to normalize amounts by the greatest common divisor, to make the target list smaller.

Also, this might be interesting.

0人赞添加讨论(0) 举报

听够珍惜

7楼-- · 2019-01-01 09:17

Since Python 3.6, there's a solution for this in Python's standard library, namely random.choices.

Example usage: let's set up a population and weights matching those in the OP's question:

>>> from random import choices
>>> population = [1, 2, 3, 4, 5, 6]
>>> weights = [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]

Now choices(population, weights) generates a single sample:

>>> choices(population, weights)
4

The optional keyword-only argument k allows one to request more than one sample at once. This is valuable because there's some preparatory work that random.choices has to do every time it's called, prior to generating any samples; by generating many samples at once, we only have to do that preparatory work once. Here we generate a million samples, and use collections.Counter to check that the distribution we get roughly matches the weights we gave.

>>> million_samples = choices(population, weights, k=10**6)
>>> from collections import Counter
>>> Counter(million_samples)
Counter({5: 399616, 6: 200387, 4: 200117, 1: 99636, 3: 50219, 2: 50025})

0人赞添加讨论(0) 举报

上一页 1 2

Generate random numbers with a given (numerical) d

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间