可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Imagine you have an array of hashes representing competitor and their probability to win the prize (a float between 0 and 1). Like:
[ {:name => "Adam" , :prob => 0.5}
{:name => "Ben" , :prob => 1.0}
{:name => "Chris" , :prob => 0.1}
{:name => "Daniel" , :prob => 0.2}
{:name => "Ed" , :prob => 0.7}
{:name => "Frey" , :prob => 0.5}
{:name => "Gilbert" , :prob => 0.3}
]
I would like to have an algorithm on which I can select three winners using random numbers but respecting the probability of each person.
The total probability of the sample is 3.3
A logical approach would be to calculate the random value like:
val = rand(33)/10.0
And scan the array until I get the person which reaches the random number.
This approach works but it implies a scan in the array.
I wonder if there would be a more straightforward solution. Any ideas?
PS: Imagine the array might have a great amount of elements.
回答1:
I was thinking about this and I think my result makes sense:
- sort the vector according to the probability: [a=0.1,b=0.2,c=0.3,d=0.4]
- choose a random number (e.g. 0.5)
- iterate from the beginning and sum the probability values and stop when it is higher:
answer = 0.1 + 0.2 + 0.3. So, 0.6 > 0.5, we value 'c'
my gist over this is that the values in the end of the vector should have a higher probability of being chosen. I have implemented in python:
values = [0.1,0.2,0.3,0.4]
count_values = len(values)*[0]
answer = len(values)*[0]
iterations = 10000
for i in range(0,iterations):
rand = float(random.randint(0,iterations))/iterations
count = 0
sum = 0
while sum <= rand and count <= len(values):
sum += values[count]
count += 1
count_values[count-1]+=1
for i in range(0,len(count_values)):
answer[i] = float(count_values[i])/iterations
and running several times, I calculated the probability of all elements being chosen, that should match our initial probability:
[0.1043, 0.196, 0.307, 0.3927]
[0.1018, 0.2003, 0.2954, 0.4025]
[0.0965, 0.1997, 0.3039, 0.3999]
回答2:
Create a loop which runs till 3 winners are selected. In this loop,generate a particular random number using any random method available in the programming language of your choice. After this,start iterating through the users. If any user's probability is lesser than this random number,accept that user as a winner. If in any iteration of the loop a winner isn't selected,for example,in a case where the lowest probability in your list is 0.2 and the random number generated is 0.1,in that case,continue to the next iteration of the loop. Break out of the loop when you get 3 winners. A probable pseudo code for such could be as follows:
int count=0;
while(count<3){
temp=GenerateRandomNumber()
int userIndex= AcceptWinner(UserListProbability,temp)
//here keep iterating through the users to check which user's probability is less than temp and returns the index of the winner in the List
if(userIndex==-1)//No winner selected
continue;
else{
count++;
Print List(userIndex)
}
}
Note:the list should be sorted
回答3:
I'm assuming that in your example "probability" means "weight" (so folks with a 1.0 probability aren't guaranteed to win and the total probability won't sum to 1.0)
You could build a tree of nodes where the leaf nodes contained your individual entries:
leaf1 = {:name => "Adam" , :prob => 0.5}
leaf2 = {:name => "Ben" , :prob => 1.0}
and each node contained the sum of the nodes under it
node1 = { :prob_sum => 1.5 , :children=> [ leaf1, leaf2] }
The root node then contains the sum of the whole structure
root_node = { :prob_sum => 33 , :children => [ leaf9, leaf10] }
Then you pick a random number between zero and the sum contained in the root node.
my_random = rand( root_node.prob_sum )
Then traverse the tree. Each node contains the sum of all nodes under it, so if your random number is larger than the node you subtract that node's value and skip that branch.
def find_node( my_random ):
c = children.first()
while( c ):
if ( c.prob_sum < my_random ):
return c.find_node(my_random)
my_random -= c.prob_sum
c = c.next
Assuming you've built a balanced tree you should get results in O(log n)
Alternatively, you could get the same result by adding a running total field to your current data set and doing a binary search based on that running total. That would probably be easier to implement, but would only be applicable if your working set could fit in memory.
回答4:
There is also the approach which is working today but has some problems.
Today I create an array and put the probability*100 entries for each person in that array.
Then it is possible to do a random directly to the array content.
First problem is that it is costly on every aspect (memory, processing, ...) and it does not scale.
The second problem I have when choosing the second and third person once either I take the first out or I do a loop with random until a different person is picked up.
Nevertheless, for small datasets (like I have so far but will increase along time), this solution works fine.