Java: random integer with non-uniform distribution

2019-01-22 01:37发布

How can I create a random integer n in Java, between 1 and k with a "linear descending distribution", i.e. 1 is most likely, 2 is less likely, 3 less likely, ..., k least likely, and the probabilities descend linearly, like this:

enter image description here

I know that there are dosens of threads on this topic already, and I apologize for making a new one, but I can't seem to be able to create what I need from them. I know that using import java.util.*;, the code

Random r=new Random();
int n=r.nextInt(k)+1;

creates a random integer between 1 and k, distributed uniformly.

GENERALIZATION: Any hints for creating an arbitrarily distributed integer, i.e. f(n)=some function, P(n)=f(n)/(f(1)+...+f(k))), would also be appreciated, for example: enter image description here.

10条回答
Summer. ? 凉城
2楼-- · 2019-01-22 02:15

This should give you what you need:

public static int getLinnearRandomNumber(int maxSize){
    //Get a linearly multiplied random number
    int randomMultiplier = maxSize * (maxSize + 1) / 2;
    Random r=new Random();
    int randomInt = r.nextInt(randomMultiplier);

    //Linearly iterate through the possible values to find the correct one
    int linearRandomNumber = 0;
    for(int i=maxSize; randomInt >= 0; i--){
        randomInt -= i;
        linearRandomNumber++;
    }

    return linearRandomNumber;
}

Also, here is a general solution for POSITIVE functions (negative functions don't really make sense) along the range from start index to stopIndex:

public static int getYourPositiveFunctionRandomNumber(int startIndex, int stopIndex) {
    //Generate a random number whose value ranges from 0.0 to the sum of the values of yourFunction for all the possible integer return values from startIndex to stopIndex.
    double randomMultiplier = 0;
    for (int i = startIndex; i <= stopIndex; i++) {
        randomMultiplier += yourFunction(i);//yourFunction(startIndex) + yourFunction(startIndex + 1) + .. yourFunction(stopIndex -1) + yourFunction(stopIndex)
    }
    Random r = new Random();
    double randomDouble = r.nextDouble() * randomMultiplier;

    //For each possible integer return value, subtract yourFunction value for that possible return value till you get below 0.  Once you get below 0, return the current value.  
    int yourFunctionRandomNumber = startIndex;
    randomDouble = randomDouble - yourFunction(yourFunctionRandomNumber);
    while (randomDouble >= 0) {
        yourFunctionRandomNumber++;
        randomDouble = randomDouble - yourFunction(yourFunctionRandomNumber);
    }

    return yourFunctionRandomNumber;
}

Note: For functions that may return negative values, one method could be to take the absolute value of that function and apply it to the above solution for each yourFunction call.

查看更多
兄弟一词,经得起流年.
3楼-- · 2019-01-22 02:18

There are lots of ways to do this, but probably the easiest is just to generate two random integers, one between 0 and k, call it x, one between 0 and h, call it y. If y > mx + b (m and b chosen appropriately...) then k-x, else x.

Edit: responding to comments up here so I can have a little more space.

Basically my solution exploits symmetry in your original distribution, where p(x) is a linear function of x. I responded before your edit about generalization, and this solution doesn't work in the general case (because there is no such symmetry in the general case).

I imagined the problem like this:

  1. You have two right triangles, each k x h, with a common hypotenuse. The composite shape is a k x h rectangle.
  2. Generate a random point that falls on each point within the rectangle with equal probability.
  3. Half the time it will fall in one triangle, half the time in the other.
  4. Suppose the point falls in the lower triangle.
    • The triangle basically describes the P.M.F., and the "height" of the triangle over each x-value describes the probability that the point will have such an x-value. (Remember that we're only dealing with points in the lower triangle.) So by yield the x-value.
  5. Suppose the point falls in the upper triangle.
    • Invert the coordinates and handle it as above with the lower triangle.

You'll have to take care of the edge cases also (I didn't bother). E.g. I see now that your distribution starts at 1, not 0, so there's an off-by-one in there, but it's easily fixed.

查看更多
淡お忘
4楼-- · 2019-01-22 02:20

The first solution that comes to mind is to use a blocked-array. Each index would specify a range of values depending on how "probable" you want it to be. In this case, you would use a wider range for 1, less wider for 2, and so on until you reach a small value (lets say 1) for k.

int [] indexBound = new int[k];
int prevBound =0;
for(int i=0;i<k;i++){
    indexBound[i] = prevBound+prob(i);
    prevBound=indexBound[i];
}
int r = new Random().nextInt(prevBound);
for(int i=0;i<k;i++){
    if(r > indexBound[i];
        return i;
}

Now the problem is just finding a random number, and then mapping that number to its bucket. you can do this for any distribution provided you can discretize the width of each interval. Let me know if i am missing something either in explaining the algorithm or its correctness. Needless to say, this needs to be optimized.

查看更多
Lonely孤独者°
5楼-- · 2019-01-22 02:21

Something like this....

class DiscreteDistribution
{
    // cumulative distribution
    final private double[] cdf;
    final private int k;

    public DiscreteDistribution(Function<Integer, Double> pdf, int k)
    {
        this.k = k;
        this.cdf = new double[k];
        double S = 0;
        for (int i = 0; i < k; ++i)
        {
            double p = pdf.apply(i+1);         
            S += p;
            this.cdf[i] = S;
        }
        for (int i = 0; i < k; ++i)
        {
            this.cdf[i] /= S;
        }
    }
    /**
     * transform a cumulative distribution between 0 (inclusive) and 1 (exclusive)
     * to an integer between 1 and k.
     */
    public int transform(double q)
    {
        // exercise for the reader:
        // binary search on cdf for the lowest index i where q < cdf[i]
        // return this number + 1 (to get into a 1-based index.
        // If q >= 1, return k.
    }
}
查看更多
登录 后发表回答