可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

The Zipf probability distribution is often used to model file size distribution or item access distributions on items in P2P systems. e.g. "Web Caching and Zip like Distribution Evidence and Implications", but neither Boost or the GSL (Gnu Scientific Library) provide an implementation to generate random numbers using this distribution. I have not found a (trustworthy) implementation using the common search engines.

How can random numbers that are distributed according to the Zipf distribution by using a U(0,1) random generator, e.g. the Mersenne twister?

回答1:

zipfR is a free and open source library implemented with R. VGAM is another R package that also implements Zipf.

It's also worth noting that the Gnu Scientific Library has an implementation of the Pareto distribution which is effectively the continuous analogue of the discrete Zipf distribution.

Also, the Zeta distribution is equivalent to Zipf for infinite N. The GSL has an implementation of the Riemann zeta function, so you could use that to construct the distribution yourself.

回答2:

numpy.random.zipf generates Zipf samples using python.

回答3:

Here's a Python Zipf-like distribution generator for n items with parameter alpha >= 0:

import random 
import bisect 
import math 

class ZipfGenerator: 

    def __init__(self, n, alpha): 
        # Calculate Zeta values from 1 to n: 
        tmp = [1. / (math.pow(float(i), alpha)) for i in range(1, n+1)] 
        zeta = reduce(lambda sums, x: sums + [sums[-1] + x], tmp, [0]) 

        # Store the translation map: 
        self.distMap = [x / zeta[-1] for x in zeta] 

    def next(self): 
        # Take a uniform 0-1 pseudo-random value: 
        u = random.random()  

        # Translate the Zipf variable: 
        return bisect.bisect(self.distMap, u) - 1

回答4:

A very efficient algorithm to generate Zipf distributed random variates was recently developed for the next versions (>= 3.6) of the Apache Commons Math library (see code here). It makes use of rejection-inversion sampling and also works for exponents less than 1. It does not require precalculating the CDF and keeping it in memory. Furthermore, the costs for generating one sample are constant and do not increase with the number of items.