java random percentages

2019-03-24 13:18发布

问题:

I need to generate n percentages (integers between 0 and 100) such that the sum of all n numbers adds up to 100.

If I just do nextInt() n times, each time ensuring that the parameter is 100 minus the previously accumulated sum, then my percentages are biased (i.e. the first generated number will usually be largest etc.). How do I do this in an unbiased way?

回答1:

A couple of answers suggest picking random percents and taking the differences between them. As Nikita Ryback points out, this will not give the uniform distribution over all possibilities; in particular, zeroes will be less frequent than expected.

To fix this, think of starting with 100 'percents' and inserting dividers. I will show an example with 10:

 % % % % % % % % % % 

There are eleven places we could insert a divider: between any two percents or at the beginning or end. So insert one:

 % % % % / % % % % % % 

This represents choosing four and six. Now insert another divider. This time, there are twelve places, because the divider already inserted creates and extra one. In particular, there are two ways to get

 % % % % / / % % % % % % 

either inserting before or after the previous divider. You can continue the process until you have as many dividers as you need (one fewer than the number of percents.)

 % % / % / % / / % % % / % % % / 

This corresponds to 2,1,1,0,3,3,0.

We can prove that this gives the uniform distribution. The number of compositions of 100 into k parts is the binomial coefficient 100+k-1 choose k-1. That is (100+k-1)(100+k-2)...101 / (k-1)(k-2)*...*2*1 Thus the probability of choosing any particular composition is the reciprocal of this. As we insert dividers one at a time, first we choose from 101 positions, then 102, 103, etc until we get to 100+k-1. So the probability of any particular sequence of insertions is 1 / (100+k-1)*...*101. How many insertion sequences give rise to the same composition? The final composition contains k-1 dividers. They could have been inserted in any order, so there are (k-1)! sequences that give rise to a given composition. So the probability of any particular composition is exactly what it should be.

In actual code, you probably wouldn't represent your steps like this. You should be able to just hold on to numbers, rather than sequences of percents and dividers. I haven't thought about the complexity of this algorithm.



回答2:

Generate n random integers with any range (call them a[1]..a[n]). Sum up your integers and call that b. Your percentages will be [a[1]/b, ..., a[n]/b].

Edit: good points, rounding the results to total exactly 100 is non-trival. One approach would be to take the floor of a[x]/b for x in 1..n as your integers, then distribute the remainding units 100-(sum of integers) randomly. I'm not sure if this would introduce any bias into the result.



回答3:

You possibly need to define what you really mean by "biased" - but if all you care about is that the distribution of the numbers is independent of their position, then you can simply create the numbers in a "biased" way and then randomise their positions.

Another "unbiased" method would be to create n-1 random percentages, sort them (call this x1 x2 x3...) and then define your final percentages to be:

x1
x2 - x1
x3 - x2
...
100 - x(n-1)

That way you will get n random numbers that add to 100.



回答4:

This problem is known as uniform sampling from a simplex and Wikipedia gives two algorithms:

http://en.wikipedia.org/wiki/Simplex#Random_sampling

See also these related questions:

  • Sample uniformly at random from an n-dimensional unit simplex
  • Generating a probability distribution


回答5:

The key is to generate N random numbers between 0 and 100 but to use these as "markers" rather than the final sequence of numbers to output. Then you iterate through your list of markers in ascending order, calculating each percentage to output as (current marker - previous marker).

This will give a much more even distribution than simply generating and outputting each number one at a time.

Example

import java.util.Random;
import java.util.TreeSet;
import java.util.SortedSet;

public class Main {
  public static void main(String[] args) {
    Random rnd = new Random();
    SortedSet<Integer> set = new TreeSet<Integer>();

    for (int i=0; i<9; ++i) {
      set.add(rnd.nextInt(101));
    }

    if (set.last() < 100) {
      set.add(100);
    }    

    int prev = 0;
    int total = 0;    
    int output;

    for (int j : set) {
      output = j - prev;
      total += output;
      System.err.println(String.format("Value: %d, Output: %d, Total So Far: %d", j, output, total));
      prev = j;
    }
  }
}

Output

$ java Main
Value: 0, Output: 0, Total So Far: 0
Value: 2, Output: 2, Total So Far: 2
Value: 55, Output: 53, Total So Far: 55
Value: 56, Output: 1, Total So Far: 56
Value: 57, Output: 1, Total So Far: 57
Value: 69, Output: 12, Total So Far: 69
Value: 71, Output: 2, Total So Far: 71
Value: 80, Output: 9, Total So Far: 80
Value: 92, Output: 12, Total So Far: 92
Value: 100, Output: 8, Total So Far: 100


回答6:

Make an array. Randomly drop 100 %'s into each of the parts of that array. Example shows n=7.

import java.util.Random;

public class random100 {
    public static void main (String [] args) {
        Random rnd = new Random();
            int percents[] = new int[7];
            for (int i = 0; i < 100; i++) {
                int bucket = rnd.nextInt(7);
                percents[bucket] = percents[bucket] + 1;
            }
        for (int i = 0; i < 7; i++) {
            System.out.println("bucket " + i + ": " + percents[i]);
        }

    }

}


回答7:

To be precise it depends on exactly how you want the samples to be unbiased. Here is a rough way which will roughly give you a good result.

  1. Generate n-1 integers from 0,..100, say a[i] for i = 0, to n-2.
  2. Let total be the sum of these numbers
  3. Compute b[i] = floor(100*a[i]/total) for i = 0, to n-2
  4. Set b[n-1] = 100 - (b[0] + ... b[n-2]).

Then b is your resulting array of percentages.

The last one will be biased, but the rest should be uniform.

Of course if you want to do this in a more accurate way you'll have to use Gibbs sampling or Metropolis hastings.



回答8:

Once you pick the numbers with the method you describe, shuffle the order of the numbers. This way the final list of numbers has a more even distribution.

However, note that no matter what you do, you can't get a perfectly even distribution since once you start to choose numbers, your random trials are not independent. See ataylor's answer.

Note also that the algorithm you describe might not give you the required output. The last number cannot be random since it must make the sum equal 100.



回答9:

First, obvious solution.

do
    int[] a = new int[n];
    for (int i = 0; i < n; ++i) {
        a[i] = random number between 0 and 100;
    }
until sum(a) == 100;

It's not perfect in terms of complexity (number of iterations to reach sum 100 can be quite large), but distribution is surely 'unbiased'.

edit
Similar problem: how to generate random point in a circle with radius 1 and center in (0, 0)? Solution: continue generating random points in range (square) [-1..1,-1..1] until one of them fits the circle :)



回答10:

I had a similar issue and ended up doing as you said, generating random integers up to the difference of the sum of the existing integers and the limit. I then randomized the order of the integers. It worked pretty well. That was for a genetic algorithm.



回答11:

Imagine you have 100 stones and N buckets to place them in. You can take all 100 and place on in a random bucket. This way the total will be the 100 you started with and there will be no bias between any bucket.

public static int[] randomBuckets(int total, int n_buckets) {
    int[] buckets = new int[n_buckets];
    Random rand = new Random();
    for(int i=0;i<total;i++)
        buckets[rand.nextInt(n_buckets)]++;
    return buckets;
}

public static void main(String... args) {
    for(int i=2; i<=10;i++)
        System.out.println(Arrays.toString(randomBuckets(100, i)));
}

Prints

[55, 45]
[38, 34, 28]
[22, 21, 32, 25]
[28, 24, 18, 15, 15]
[17, 14, 13, 21, 18, 17]
[17, 19, 14, 15, 6, 15, 14]
[11, 14, 14, 14, 4, 17, 9, 17]
[13, 12, 15, 12, 8, 10, 9, 11, 10]
[11, 13, 12, 6, 6, 11, 13, 3, 15, 10]

As the count increases, the distribution approaches uniform.

System.out.println(Arrays.toString(randomBuckets(100000000, 100)));

Prints

[1000076, 1000612, 999600, 999480, 998226, 998303, 1000528, 1000450, 999529, 
998480, 998903, 1002685, 999230, 1000631, 1001171, 997757, 1000349, 1000527, 
1002408, 1000852, 1000450, 999318, 999453, 1000099, 1000759, 1000426, 999404, 
1000758, 1000939, 999950, 1000493, 1001396, 1001007, 999258, 1001709, 1000593,
1000614, 1000667, 1000168, 999448, 999350, 1000479, 999991, 999778, 1000513, 
998812, 1001295, 999314, 1000738, 1000211, 999855, 999349, 999842, 999635, 
999301, 1001707, 998224, 1000577, 999405, 998760, 1000036, 1000110, 1002471, 
1000234, 1000975, 998688, 999434, 999660, 1001741, 999834, 998855, 1001009, 
999523, 1000207, 998885, 999598, 998375, 1000319, 1000660, 1001727, 1000546, 
1000438, 999815, 998121, 1001128, 1000191, 998609, 998535, 999617, 1001895, 
999230, 998968, 999844, 999392, 999669, 999407, 998380, 1000732, 998778, 1000522]


回答12:

Here is some code that I wrote for a program I am creating. I found this thread when trying to solve this exact problem, so hopefully this will help some others. The design was based on reading @eruonna response above.

public static int[] randomNumbers(int numOfNumbers){

    int percentN = numOfNumbers;

    int[] intArray = new int[101];

    //set up the array with values
    for(int i = 0; i < intArray.length; i++){
        intArray[i] = i;
    }

    //set up an array to hold the selected values
    int[] selectionArray = new int[(percentN - 1)];

    //run a for loop to go through and select random numbers from the intArray
    for(int n = 0; n < selectionArray.length; n++){
        int randomNum = (int)(Math.random() * 100);
        selectionArray[n] = intArray[randomNum];
    }

    //bubble sort the items in the selectionArray
    for(int out = (selectionArray.length - 1); out > 1; out--){
        for(int in = 0; in < out; in++){
            if(selectionArray[in] > selectionArray[in + 1]){
                int temp = selectionArray[in];
                selectionArray[in] = selectionArray[in + 1];
                selectionArray[in + 1] = temp;
            }
        }
    }

    //create an array to hold the calculated differences between each of the values to create random numbers
    int[] calculationArray = new int[percentN];

    //calculate the difference between the first item in the array and 0
    calculationArray[0] = (selectionArray[0] - 0);

    //calculate the difference between the other items in the array (except for the last value)
    for(int z = 1; z < (calculationArray.length - 1); z++){
        calculationArray[z] = (selectionArray[z] - selectionArray[z - 1]);
    }

    //calculate the difference for the last item in the array
    calculationArray[(calculationArray.length - 1)] = (100 - selectionArray[(selectionArray.length - 1)]);

    return calculationArray;

}