I need to generate a set of random numbers within an interval which also happens to have a mean value. For instance min = 1000, max = 10000 and a mean of 7000. I know how to create numbers within a range but I am struggling with the mean value thing. Is there a function that I can use?
相关问题
- Multiple sockets for clients to connect to
- Do the Java Integer and Double objects have unnece
- What is the best way to do a search in a large fil
- glDrawElements only draws half a quad
- Index of single bit in long integer (in C) [duplic
What you're looking for is done most easily with so called acceptance rejection method.
Split your interval into smaller intervals. Specify a probability density function (PDF), can be a very simple one too, like a step function. For Gaussian distrubution you would have left and right steps lower than your middle step i.e (see the image bellow that has a more general distribution).
Generate a random number in the whole interval. If the generated number is greater than the value of your PDF at that point reject the generated number.
Repeat the steps until you get desired number of points
EDIT 1
Proof of concept on a Gaussian PDF.
Ok, so the basic idea is shown in graph (a).
x
if it satisfies: 1)f(x) >= 0
and 2) it's normalized (meaning it sums, or integrates, up to the value 1).max
) and "zero points" (z1 < z2
) of PDF. Some PDF's can have their zero points in infinity. In that case, determine cutoff points(z1, z2)
for whichPDF(z1>x>z2) < eta
where you picketa
yourself. Basically means, set some small-ish valueeta
and then say your zero points are those values for which the value ofPDF(x)
is smaller than eta.Ch(z1, z2, max)
of your random generator. This is the interval in which you generate your random variables.z1<x<z2
.y
in the range(0, max)
. If the value ofy
is smaller thanPDF(x)
reject both randomly generated values(x,y)
and go back to step 4. If the generated valuey
is larger thanPDF(x)
accept the valuex
as the randomly generated point on a distribution andreturn
it.Here's the code that reproduces similar behavior for a Gaussian PDF.
Step 1
So first we define a general look of a Gaussian PDF in a function called
gaus
. Simple.Then we define a function
random_on_a_gaus_distribution
which uses a well defined Gaussian function. In an experiment\measurement we would get coefficientsa, b, c
by fitting our function. I picked some random ones (1, 2, 3) for this example, you can pick the ones that satisfy your HW assignment (that is: coefficients that make a Gaussian that has a mean of 7000).Step 2 and 3
I used wolfram mathematica to plot gaus. with parameters 1,2,3 too see what would be the most appropriate values for
max
and(z1, z2)
. You can see the graph yourself. Maximum of the function is 1.0 and via ancient method of science called eyeballin' I estimated that the cutoff points are -5.0 and 10.0.To make
random_on_a_gaus_distribution
more general you could follow step 2) more rigorously and defineeta
and then calculate your function in successive points until PDF gets smaller than eta. Dangers with this are that your cutoff points can be very far apart and this could take long for very monotonous functions. Additionally you have to find the maximum yourself. This is generally tricky, However a simpler problem is minimization of a negative of a function. This can also be tricky for a general case but not "undoable". Easiest way is to cheat a bit like I did and just hard-code this for a couple of functions only.Step 4 and 5
And then you bash away. Just keep creating new and new points until you reach satisfactory hit. DO NOTICE the returned number
x
is a random number. You wouldn't be able to find a logical link between two successively createdx
values, or first createdx
and the millionth.However the number of accepted
x
values in the interval around thex_max
of our distribution is greater than the number ofx
values created in intervals for whichPDF(x) < PDF(x_max)
.This just means that your random numbers will be weighted within the chosen interval in such manner that the larger PDF value for a random variable
x
will correspond to more random points accepted in a small interval around that value than around any other value ofxi
for whichPDF(xi)<PDF(x)
.I returned both x and y to be able to plot the graph bellow, however what you're looking to return is actually just the
x
. I did the plots with matplotlib.It's probably better to show just a histogram of randomly created variable on a distribution. This shows that the
x
values that are around the mean value of your PDF function are the most likely ones to get accepted, and therefore more randomly created variables with those approximate values will be created.Additionally I assume you would be interested in implementation of the kiss Random number generator. IT IS VERY IMPORTANT YOU HAVE A VERY GOOD GENERATOR. I dare to say to an extent kiss doesn't probably cut it (mersene twister is used often).
Random.h
Kiss.cpp
Additionally there's the standard C random generators: CRands.cpp
It's worthy also to comment on the (b) graph above. When you have a very badly behaved PDF,
PDF(x)
will vary significantly between large numbers and very small ones.Issue with that is that the interval area
Ch(x)
will match the extreme values of the PDF well, but since we create a random variabley
for small values ofPDF(x)
as well; the chances of accepting that value are minute! It is more likely that the generatedy
value will always be larger thanPDF(x)
at that point. This means that you'll spend a lot of cycles creating numbers that won't get chosen and that all your chosen random numbers will be very locally bound to themax
of your PDF.That's why it's often useful not to have the same
Ch(x)
intervals everywhere, but to define a parametrized set of intervals. However this adds a fair bit of complexity to the code.Where do you set your limits? How to deal with borderline cases? When and how to determine that you indeed need to suddenly use this approach? Calculating
max
might not be as simple now, depending on the method you originally envisioned would be doing this.Additionally now you have to correct for the fact that a lot more numbers get accepted more easily in the areas where your
Ch(x)
box height is lower which skews the original PDF.This can be corrected by weighing numbers created in the lowered boundary by the ratio of heights of higher and lower boundary, basically you repeat the
y
step one more time. Create a random numberz
from 0 to 1 and compare it to the ratio lower_height/higher_height, guaranteed to be <1. Ifz
is smaller than the ratio: acceptx
and if it's larger reject.Generalizations of code presented are also possible by writing a function, that takes in an object pointer instead. By defining your own class i.e.
function
which would generally describe functions, have a eval method at a point, be able to store your parameters, calculate and store it's own max/min values and zero/cutoff points, you wouldn't have to pass, or define them in a function like I did.Good Luck have fun!
tl;dr: Raise a uniform 0 to 1 distribution to the power
(1 - m) / m
wherem
is the desired mean (between 0 and 1). Shift/scale as desired.I was curious about how to implement this. I figured a trapezoid would be the easiest method, but then you're limited in that the most extreme mean you can get is with a triangle, which isn't that extreme. The math started getting hard, so I reverted to a purely empirical method that seems to work pretty well.
Anyways, for a distribution, how about starting with the uniform [0, 1) distribution and raising the values to some arbitrary power. Square them and the distribution shifts to the right. Square root them and they shift to the left. You can go to whatever extreme you want and shove the distribution as hard as you want.
(Everything's written in Python, but should be easy enough to translate. If something's unclear, just ask.
random.random()
returns floats from 0 to 1)So, how do we adjust that power? Well, how's the mean seem to shift with varying powers?
Looks like some sort of sigmoid curve. There are lots of sigmoid functions, but hyperbolic tangent seems to work pretty well.
Not 100% there, lets try to scale it in the X direction...
The fitter says the best scaling factor is 1.1514088816214016. The residuals are actually pretty low, so sounds good.
Implementing the inverse of all the math I didn't talk about looks like:
That gives us the power to use in the first function to get whatever mean to the distribution. A factory function can return a method to churn out a bunch of numbers from the distribution with the desired mean
How's it do? Reasonably well out to 3-4 decimals:
A more succinct function that just gives you a value given a mean (not a factory function):
Edit: fitting against the natural log of the mean instead of log10 gave a residual suspiciously close to 0.5. Doing some math to simplify out the arctanh gives:
From here it should be fairly easy to shift, rescale, and round off the distribution. The truncating-to-integer might end up shifting the mean by 1 (or half a unit?), so that's an unsolved problem (if it matters).