I need to random some elements from an array. I'm doing that by randomizing the index $array[int(rand(100))]
. I want some of the elements to appear more often. How do I do it?
I thought of a stupid solution of having those elements duplicated several times in the array, but I'm sure you guys can do better.
You want to generate a weighted random sample. The linked question covers both with and without replacement.
Seems that a rather natural way involves setting up a binary search. Let's say a bunch of people are enrolled in a raffle, where people are allowed to submit their names as many times as they want. We have the following names with the following number of submissions:
- Juliet: 2
- Jenny: 11
- Jessica: 7
- Jan: 1
- Jane: 1
- Jean: 5
Now if we want to randomly select a name out of the bag, we just assign each name a range starting from 0:
- Juliet: 0, 1
- Jenny: 2, 12
- Jessica: 13, 19
- Jan: 20, 20
- Jane: 21, 21
- Jean: 22, 26
Alright, so we have an array of adjacent ranges, were each range is between 0 and 26. We use a modified binary search to find our target item (pseudocode):
let raffle := { Juliet: 0, 1;
Jenny: 2, 12;
Jessica: 13, 19;
Jan: 20, 20;
Jane: 21, 21;
Jean: 22, 26 }
let search minIndex maxIndex rangeValue =
if minIndex > maxIndex then
failwith "Not found"
let selectedIndex = (minIndex + maxIndex) / 2
let item = raffle[selectedIndex]
if item.range.min >= rangeValue && item.range.max <= rangeValue
return item.name
elif item.range.min < rangeValue
return search minIndex (selectedIndex - 1) rangeValue
else
return search (selectedIndex + 1) maxIndex rangeValue
This page provides the theory for generating random numbers from arbitrary distribution.
If you know the frequency with which you want the random numbers to appear, you could use the rand() function. Let's say you want the number 0 to appear 33% of the time, and 1 to appear the other 66%. Then you would check if rand() < 0.33, and return some index. Otherwise, return another index. This is just one way to do it.
Another option is to have a structure similar to the following: (pardon my language, I don't actually know Perl all that well)
[
(1, 10),
(2, 50),
(3, 80),
(4, 100)
]
Then when you get the value from int(rand(100))
you can compare it to each of the second elements in turn and return the first element.
The most general solution is a function that acts as an (inverse) cumulative distribution function: a function that maps from a (uniform) distribution from 0.0 to 1.0 to whatever distribution that you want.
Juliet gave an excellent way to implement one of them.
this sub' gets an array of element and weight -A(10)B(30)C(5)-and random one of the elements according the the weights.
sub we_rand {
my $line=$_[0];#get the elements array
my @b = split(/\(\d+\)/,$line);#b now holds each elemnet from the array
my @a = "";
my $i=0;
my $tmp;
while ($line=~m/(\(\d+\)+)/g) {
$tmp=$1;#temp gets the weight
if ($tmp=~m/\d+/g) {
if ($i>0){
$a[$i]=$&+$a[$i-1];#if weight is grather then 0 -each cell of
#a sums up the wheights up to it
}
else {
$a[$i]=$&;
}
}
$i++;
}
if ($i>1){
my $n=int(rand($a[$i-1])+1);#rand a number in the boundries of
#the total weight of all the elements
my $s=scalar(@b);
#go through a and compare to the randomized num-then take the
#element from b
for ( $i=0;$i<scalar(@b);$i++){
if($n<=$a[$i])
{
$c=$b[$i];
last;
}
}
} else {
$c=$b[0];#if only one element
}
return $c;
}
you call the sub' like this:
my $rand=we_rand(A(10)B(30)C(5)D(17));
-#$rand=B