Exponential-decay-like-random-distribution and Dis

2019-06-11 03:39发布

问题:

sadly I'm not really experienced in using random numbers in programming, despite the use of uniform integers in range. Therefore i have to questions regarding this topic.

Question 1 (more specific):

I'm looking for a way to chose array elements (dynamic size, but known) according to a probability distribution similar to the curve of "exponential decay" (http://en.wikipedia.org/wiki/Exponential_decay). Meaning: i want to prefer to chose the first elements rather than the others. I want an monotonic decreasing function (no growing before decreasing like in many well-known probability-distributions like the gamma-distribution).

Maybe the geometric-distribution is something which i could use? But then i need an answer to my second question regarding the scaling of this distribution to array indexes.

The dual method to prefer choosing the last elements rather than the first would be ok too, of course.

Question 2 (more general): Is there a concept in any implementation which will scale me any continuous random-distribution to a given array-range (including discretization)?

Example: Use a gaussian normal distribution and the result is always a valid index in some array (meaning: the middle elements are preferred).

Could this (link text) be something like i want to use?

Platform and Libraries: I'm programming in C++ and use the boost::random library at the moment (link text), but i'm willing to use something like the the gsl library or other quality libraries.

One more wish: I would prefer a way using some quality libraries rather than some quick-and-dirty custom_functions.

Thanks!

回答1:

I think breaking this problem into two steps is a good place to start. First, if you had a discrete probability distribution, then the problem of drawing from this distribution isn't so bad. Boost random has method for doing this. Scroll down this page to the weighted dice example. It will return an integer from the given probability distribution. You can use this integer to select the element from the array that you are interested in.

The second part of your question is how to get go from a continuous probability distribution like the exponential to a discrete distribution like what was used in the boost example. There are several ways you could go here, but because you said that you wanted a curve "like" exponential decay I will try and explain a quick and simple realizing we are sacrificing some statistical rigor.

The idea here is to sample from a continuous distribution at a set of discrete points and then adjust these points (normalize) so that they sum to one. The code for doing this for an exponential distribution is below.

double expDist(int x, double lambda)  
{
   return(lambda*exp(-lambda*x));
}

//code to sample from this distribution
int i,numElements //where numElements has the number of elements in the array you wish to draw from.
vector<double> output;
double sum,temp
sum=0;
for(i=0;i<numElements;i++)
{
   temp=expDist(i,0.5);  //substitute any value you want for lambda in the second argument
   output.push_back(temp); 
   sum+=temp;
}
//after having sampled at all the points we need to divide each element in the array by the variable sum so that the sum of the values in the array is equal to 1 and thus a valid probability distribution
for(i=0;i<numElements;i++)
{
   output[i]/=sum;
}

You can then feed the output variable into the weighted dice example in the boost library and it should fit your needs. This general method of discrete sampling and then normalizing the vector can work for many different kinds of distributions.



回答2:

The general rule is to pick your numbers in a uniform distribution and then apply a function to transform them into the distribution you want. The function you apply is the inverse of the function you want the random numbers to fall in.

If you want random numbers to be picked with probability proportional to f(x) then you pick a random number from a uniform distribution, u, and apply f^-1(u) and that's your new number.

So if you want your random numbers to be picked with probability proportional to exp(-x), then you pick a uniformly distributed random number and take the ln of it:

double x=ln(rand()); 

should give you random numbers with the probability distribution of exp(-x).

Note: I'm not saying that rand() is a good function to use, you need to research the details of good random number generators. But assuming you have a good random number generator, this is a good solution.

Edit: forgot a minus sign:

double x=-ln(rand()); 

is the right answer.



回答3:

Your Q2: "Example: Use a gaussian normal distribution and the result is always a valid index in some array (meaning: the middle elements are preferred)."

Unless I misunderstand this is NOT true. A random variable following normal distribution can theoretically take values in the range (-infinity,infinity). So, unless you truncate outliers and force the random variable values that fall outside, say, +/- 3 standard deviations to the +/- 3rd standard deviation value, there is no way you can force a normal distribution onto a finite grid.



回答4:

Q1) What it sounds like you're looking for is an exponential distribution. The Boost library comes with an exponential distribution generator.

Q2) This sounds like you want to create a histogram. In the example you site, set the bins of the middle region of your array to represent elements closer to the mean of the normal random values you're drawing from the distribution. If you don't have enough information about the nature of the disribution, you will need to collect a representative sample from the distribution of interest and store that in another array. Using the min and max of the sample, you can then create another array to count how many sampled elements are in each bins A reasonable rule of thumb is that you should have sqrt(n) bins if you have n samples.

Update: As Tryer correctly states, if you are not saving the elements of your distribution into a second array before creating your histogram, you will need to find some way of handling elements that fall outside the bins you have established.



回答5:

I think you are not looking for an exponential distribution since an exponential distribution assumes an unbounded number of elements, and hence yields a bias for the last element of your sequence.

What suits your problem is a Beta distribution with alpha < 1 and beta > 1.