Is there a random number distribution that obeys B

2019-02-21 15:45发布

问题:

Python has a number of ways to generate different distributions of random numbers, see the documentation for the random module. Unfortunately they aren't terribly understandable without the appropriate math background, especially considering the required parameters.

I'd like to know if any of those methods are capable of producing random numbers with a distribution that obeys Benford's Law, and what parameter values are appropriate. Namely for a population of integers, those integers should start with a '1' about 30% of the time, '2' about 18% of the time, etc.


Using Jan Dvorak's answer I put together the following code, and it appears to work perfectly.

def benfords_range_gen(stop, n):
    """ A generator that returns n random integers
    between 1 and stop-1 and whose distribution
    meets Benford's Law i.e. is logarithmic.
    """
    multiplier = math.log(stop)
    for i in range(n):
        yield int(math.exp(multiplier * random.random()))

>>> from collections import Counter
>>> Counter(str(i)[0] for i in benfords_range_gen(10000, 1000000))
Counter({'1': 300696, '2': 176142, '3': 124577, '4': 96756, '5': 79260, '6': 67413, '7': 58052, '8': 51308, '9': 45796})

回答1:

Benford's law describes the distribution of the first digits of a set of numbers if the numbers are chosen from a wide range on the logarithmic scale. If you prepare a log-uniform distribution over one decade, it will respect the law as well. 10^[0,1) will produce that distribution.

This will produce the desired distribution: math.floor(10**random.random())



回答2:

Just playing around.

A much more inefficient, but perhaps more visible implementation for those, like myself, who are not so math inclined...

An easy way to create any desired distribution is to fill a list with the desired percentages of an item, and then use random.choice(<list>), since this returns a uniform selection of items in the list.

import random
probs = [30.1, 17.6, 12.5, 9.7, 7.9, 6.7, 5.8, 5.1, 4.6]
nums = [1, 2, 3, 4, 5, 6, 7, 8, 9]
population = sum([[n] * int(p * 10) for n, p in zip(nums, probs)], [])

max_value = 100
min_value = 1
result_pop = []
target_pop_size = 1000
while len(result_pop) < target_pop_size:
    s = str(random.choice(population))
    while True:
        r = random.randint(min_value, max_value)
        if str(r).startswith(s):
            break
    result_pop.append(r)