可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm writing some tests for a C++ command line Linux app. I'd like to generate a bunch of integers with a power-law/long-tail distribution. Meaning, I get a some numbers very frequently but most of them relatively infrequently.

Ideally there would just be some magic equations I could use with rand() or one of the stdlib random functions. If not, an easy to use chunk of C/C++ would be great.

Thanks!

回答1:

This page at Wolfram MathWorld discusses how to get a power-law distribution from a uniform distribution (which is what most random number generators provide).

The short answer (derivation at the above link):

x = [(x1^(n+1) - x0^(n+1))*y + x0^(n+1)]^(1/(n+1))

where y is a uniform variate, n is the distribution power, x0 and x1 define the range of the distribution, and x is your power-law distributed variate.

回答2:

If you know the distribution you want (called the Probability Distribution Function (PDF)) and have it properly normalized, you can integrate it to get the Cumulative Distribution Function (CDF), then invert the CDF (if possible) to get the transformation you need from uniform [0,1] distribution to your desired.

So you start by defining the distribution you want.

P = F(x)

(for x in [0,1]) then integrated to give

C(y) = \int_0^y F(x) dx

If this can be inverted you get

y = F^{-1}(C)

So call rand() and plug the result in as C in the last line and use y.

This result is called the Fundamental Theorem of Sampling. This is a hassle because of the normalization requirement and the need to analytically invert the function.

Alternately you can use a rejection technique: throw a number uniformly in the desired range, then throw another number and compare to the PDF at the location indeicated by your first throw. Reject if the second throw exceeds the PDF. Tends to be inefficient for PDFs with a lot of low probability region, like those with long tails...

An intermediate approach involves inverting the CDF by brute force: you store the CDF as a lookup table, and do a reverse lookup to get the result.

The real stinker here is that simple x^-n distributions are non-normalizable on the range [0,1], so you can't use the sampling theorem. Try (x+1)^-n instead...

回答3:

I can't comment on the math required to produce a power law distribution (the other posts have suggestions) but I would suggest you familiarize yourself with the TR1 C++ Standard Library random number facilities in <random>. These provide more functionality than std::rand and std::srand. The new system specifies a modular API for generators, engines and distributions and supplies a bunch of presets.

The included distribution presets are:

uniform_int
bernoulli_distribution
geometric_distribution
poisson_distribution
binomial_distribution
uniform_real
exponential_distribution
normal_distribution
gamma_distribution

When you define your power law distribution, you should be able to plug it in with existing generators and engines. The book The C++ Standard Library Extensions by Pete Becker has a great chapter on <random>.

Here is an article about how to create other distributions (with examples for Cauchy, Chi-squared, Student t and Snedecor F)

回答4:

I just wanted to carry out an actual simulation as a complement to the (rightfully) accepted answer. Although in R, the code is so simple as to be (pseudo)-pseudo-code.

One tiny difference between the Wolfram MathWorld formula in the accepted answer and other, perhaps more common, equations is the fact that the power law exponent n (which is typically denoted as alpha) does not carry an explicit negative sign. So the chosen alpha value has to be negative, and typically between 2 and 3.

x0 and x1 stand for the lower and upper limits of the distribution.

So here it is:

x1 = 5           # Maximum value
x0 = 0.1         # It can't be zero; otherwise X^0^(neg) is 1/0.
alpha = -2.5     # It has to be negative.
y = runif(1e5)   # Number of samples
x = ((x1^(alpha+1) - x0^(alpha+1))*y + x0^(alpha+1))^(1/(alpha+1))
hist(x, prob = T, breaks=40, ylim=c(0,10), xlim=c(0,1.2), border=F, 
col="yellowgreen", main="Power law density")
lines(density(x), col="chocolate", lwd=1)
lines(density(x, adjust=2), lty="dotted", col="darkblue", lwd=2)

or plotted in logarithmic scale:

h = hist(x, prob=T, breaks=40, plot=F)
     plot(h$count, log="xy", type='l', lwd=1, lend=2, 
     xlab="", ylab="", main="Density in logarithmic scale")

Here is the summary of the data:

> summary(x)
   Min.   1st Qu.  Median    Mean   3rd Qu.    Max. 
  0.1000  0.1208  0.1584    0.2590  0.2511   4.9388

Random number generator that produces a power-law

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

Random number generator that produces a power-law

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮