Generate random numbers distributed by Zipf

2019-03-12 10:49发布

The Zipf probability distribution is often used to model file size distribution or item access distributions on items in P2P systems. e.g. "Web Caching and Zip like Distribution Evidence and Implications", but neither Boost or the GSL (Gnu Scientific Library) provide an implementation to generate random numbers using this distribution. I have not found a (trustworthy) implementation using the common search engines.

How can random numbers that are distributed according to the Zipf distribution by using a U(0,1) random generator, e.g. the Mersenne twister?

5条回答
Juvenile、少年°
2楼-- · 2019-03-12 11:04

Here's a Python Zipf-like distribution generator for n items with parameter alpha >= 0:

import random 
import bisect 
import math 

class ZipfGenerator: 

    def __init__(self, n, alpha): 
        # Calculate Zeta values from 1 to n: 
        tmp = [1. / (math.pow(float(i), alpha)) for i in range(1, n+1)] 
        zeta = reduce(lambda sums, x: sums + [sums[-1] + x], tmp, [0]) 

        # Store the translation map: 
        self.distMap = [x / zeta[-1] for x in zeta] 

    def next(self): 
        # Take a uniform 0-1 pseudo-random value: 
        u = random.random()  

        # Translate the Zipf variable: 
        return bisect.bisect(self.distMap, u) - 1
查看更多
不美不萌又怎样
3楼-- · 2019-03-12 11:05

zipfR is a free and open source library implemented with R. VGAM is another R package that also implements Zipf.

It's also worth noting that the Gnu Scientific Library has an implementation of the Pareto distribution which is effectively the continuous analogue of the discrete Zipf distribution.

Also, the Zeta distribution is equivalent to Zipf for infinite N. The GSL has an implementation of the Riemann zeta function, so you could use that to construct the distribution yourself.

查看更多
ら.Afraid
4楼-- · 2019-03-12 11:06

A very efficient algorithm to generate Zipf distributed random variates was recently developed for the next versions (>= 3.6) of the Apache Commons Math library (see code here). It makes use of rejection-inversion sampling and also works for exponents less than 1. It does not require precalculating the CDF and keeping it in memory. Furthermore, the costs for generating one sample are constant and do not increase with the number of items.

查看更多
可以哭但决不认输i
5楼-- · 2019-03-12 11:06

We were discussing the answer of @stanga in this thread. There are some nice optimizations suggested for his algorithm.

查看更多
走好不送
6楼-- · 2019-03-12 11:20

numpy.random.zipf generates Zipf samples using python.

查看更多
登录 后发表回答