A couple of answers suggest picking random percents and taking the differences between them. As Nikita Ryback points out, this will not give the uniform distribution over all possibilities; in particular, zeroes will be less frequent than expected.
To fix this, think of starting with 100 'percents' and inserting dividers. I will show an example with 10:
% % % % % % % % % %
There are eleven places we could insert a divider: between any two percents or at the beginning or end. So insert one:
% % % % / % % % % % %
This represents choosing four and six. Now insert another divider. This time, there are twelve places, because the divider already inserted creates and extra one. In particular, there are two ways to get
% % % % / / % % % % % %
either inserting before or after the previous divider. You can continue the process until you have as many dividers as you need (one fewer than the number of percents.)
% % / % / % / / % % % / % % % /
This corresponds to 2,1,1,0,3,3,0.
We can prove that this gives the uniform distribution. The number of compositions of 100 into k parts is the binomial coefficient 100+k-1 choose k-1. That is
(100+k-1)(100+k-2)...101 / (k-1)(k-2)*...*2*1
Thus the probability of choosing any particular composition is the reciprocal of this. As we insert dividers one at a time, first we choose from 101 positions, then 102, 103, etc until we get to 100+k-1. So the probability of any particular sequence of insertions is 1 / (100+k-1)*...*101. How many insertion sequences give rise to the same composition? The final composition contains k-1 dividers. They could have been inserted in any order, so there are (k-1)! sequences that give rise to a given composition. So the probability of any particular composition is exactly what it should be.
In actual code, you probably wouldn't represent your steps like this. You should be able to just hold on to numbers, rather than sequences of percents and dividers. I haven't thought about the complexity of this algorithm.
Generate n random integers with any range (call them a[1]
..a[n]
). Sum up your integers and call that b
. Your percentages will be [a[1]/b, ..., a[n]/b]
.
Edit: good points, rounding the results to total exactly 100 is non-trival. One approach would be to take the floor of a[x]/b
for x
in 1..n
as your integers, then distribute the remainding units 100-(sum of integers)
randomly. I'm not sure if this would introduce any bias into the result.
You possibly need to define what you really mean by "biased" - but if all you care about is that the distribution of the numbers is independent of their position, then you can simply create the numbers in a "biased" way and then randomise their positions.
Another "unbiased" method would be to create n-1 random percentages, sort them (call this x1 x2 x3...) and then define your final percentages to be:
x1
x2 - x1
x3 - x2
...
100 - x(n-1)
That way you will get n random numbers that add to 100.
The key is to generate N random numbers between 0 and 100 but to use these as "markers" rather than the final sequence of numbers to output. Then you iterate through your list of markers in ascending order, calculating each percentage to output as (current marker - previous marker).
This will give a much more even distribution than simply generating and outputting each number one at a time.
Example
import java.util.Random;
import java.util.TreeSet;
import java.util.SortedSet;
public class Main {
public static void main(String[] args) {
Random rnd = new Random();
SortedSet<Integer> set = new TreeSet<Integer>();
for (int i=0; i<9; ++i) {
set.add(rnd.nextInt(101));
}
if (set.last() < 100) {
set.add(100);
}
int prev = 0;
int total = 0;
int output;
for (int j : set) {
output = j - prev;
total += output;
System.err.println(String.format("Value: %d, Output: %d, Total So Far: %d", j, output, total));
prev = j;
}
}
}
Output
$ java Main
Value: 0, Output: 0, Total So Far: 0
Value: 2, Output: 2, Total So Far: 2
Value: 55, Output: 53, Total So Far: 55
Value: 56, Output: 1, Total So Far: 56
Value: 57, Output: 1, Total So Far: 57
Value: 69, Output: 12, Total So Far: 69
Value: 71, Output: 2, Total So Far: 71
Value: 80, Output: 9, Total So Far: 80
Value: 92, Output: 12, Total So Far: 92
Value: 100, Output: 8, Total So Far: 100
To be precise it depends on exactly how you want the samples to be unbiased. Here is a rough way which will roughly give you a good result.
- Generate
n-1
integers from 0,..100, say a[i]
for i = 0, to n-2
.
- Let
total
be the sum of these numbers
- Compute
b[i] = floor(100*a[i]/total)
for i = 0, to n-2
- Set
b[n-1] = 100 - (b[0] + ... b[n-2])
.
Then b is your resulting array of percentages.
The last one will be biased, but the rest should be uniform.
Of course if you want to do this in a more accurate way you'll have to use Gibbs sampling or Metropolis hastings.
Once you pick the numbers with the method you describe, shuffle the order of the numbers. This way the final list of numbers has a more even distribution.
However, note that no matter what you do, you can't get a perfectly even distribution since once you start to choose numbers, your random trials are not independent. See ataylor's answer.
Note also that the algorithm you describe might not give you the required output. The last number cannot be random since it must make the sum equal 100.
First, obvious solution.
do
int[] a = new int[n];
for (int i = 0; i < n; ++i) {
a[i] = random number between 0 and 100;
}
until sum(a) == 100;
It's not perfect in terms of complexity (number of iterations to reach sum 100 can be quite large), but distribution is surely 'unbiased'.
edit
Similar problem: how to generate random point in a circle with radius 1 and center in (0, 0)? Solution: continue generating random points in range (square) [-1..1,-1..1] until one of them fits the circle :)
Imagine you have 100 stones and N buckets to place them in. You can take all 100 and place on in a random bucket. This way the total will be the 100 you started with and there will be no bias between any bucket.
public static int[] randomBuckets(int total, int n_buckets) {
int[] buckets = new int[n_buckets];
Random rand = new Random();
for(int i=0;i<total;i++)
buckets[rand.nextInt(n_buckets)]++;
return buckets;
}
public static void main(String... args) {
for(int i=2; i<=10;i++)
System.out.println(Arrays.toString(randomBuckets(100, i)));
}
Prints
[55, 45]
[38, 34, 28]
[22, 21, 32, 25]
[28, 24, 18, 15, 15]
[17, 14, 13, 21, 18, 17]
[17, 19, 14, 15, 6, 15, 14]
[11, 14, 14, 14, 4, 17, 9, 17]
[13, 12, 15, 12, 8, 10, 9, 11, 10]
[11, 13, 12, 6, 6, 11, 13, 3, 15, 10]
As the count increases, the distribution approaches uniform.
System.out.println(Arrays.toString(randomBuckets(100000000, 100)));
Prints
[1000076, 1000612, 999600, 999480, 998226, 998303, 1000528, 1000450, 999529,
998480, 998903, 1002685, 999230, 1000631, 1001171, 997757, 1000349, 1000527,
1002408, 1000852, 1000450, 999318, 999453, 1000099, 1000759, 1000426, 999404,
1000758, 1000939, 999950, 1000493, 1001396, 1001007, 999258, 1001709, 1000593,
1000614, 1000667, 1000168, 999448, 999350, 1000479, 999991, 999778, 1000513,
998812, 1001295, 999314, 1000738, 1000211, 999855, 999349, 999842, 999635,
999301, 1001707, 998224, 1000577, 999405, 998760, 1000036, 1000110, 1002471,
1000234, 1000975, 998688, 999434, 999660, 1001741, 999834, 998855, 1001009,
999523, 1000207, 998885, 999598, 998375, 1000319, 1000660, 1001727, 1000546,
1000438, 999815, 998121, 1001128, 1000191, 998609, 998535, 999617, 1001895,
999230, 998968, 999844, 999392, 999669, 999407, 998380, 1000732, 998778, 1000522]