Flooding Bayesian rating creates values out of ran

2019-03-22 09:55发布

问题:

I'm trying to apply the Bayesian rating formula, but if I rate 1 out of 5 thousand of hundreds, the final rating is greater than 5.

For example, a given item has no votes and after voting 170,000 times with 1 star, its final rating is 5.23. If I rate 100, it has a normal value.

Here is what I have in PHP.

<?php
// these values came from DB
$total_votes     = 2936;    // total of votes for all items
$total_rating    = 582.955; // sum of all ratings
$total_items     = 202;

// now the specific item, it has no votes yet
$this_num_votes  = 0;
$this_score      = 0;
$this_rating     = 0;

// simulating a lot of votes with 1 star
for ($i=0; $i < 170000; $i++) { 
    $rating_sent = 1; // the new rating, always 1

    $total_votes++; // adding 1 to total
    $total_rating = $total_rating+$rating_sent; // adding 1 to total

    $avg_num_votes = ($total_votes/$total_items); // Average number of votes in all items
    $avg_rating = ($total_rating/$total_items);   // Average rating for all items
    $this_num_votes = $this_num_votes+1;          // Number of votes for this item
    $this_score = $this_score+$rating_sent;       // Sum of all votes for this item
    $this_rating = $this_score/$this_num_votes;   // Rating for this item

    $bayesian_rating = ( ($avg_num_votes * $avg_rating) + ($this_num_votes * $this_rating) ) / ($avg_num_votes + $this_num_votes);
}
echo $bayesian_rating;
?>

Even if I flood with 1 or 2:

$rating_sent = rand(1,2)

The final rating after 100,000 votes is over 5.

I just did a new test using

$rating_sent = rand(1,5)

And after 100,000 I got a value completely out of range range (10.53). I know that in a normal situation no item will get 170,000 votes while all the other items get no vote. But I wonder if there is something wrong with my code or if this is an expected behavior of Bayesian formula considering the massive votes.

Edit

Just to make it clear, here is a better explanation for some variables.

$avg_num_votes   // SUM(votes given to all items)/COUNT(all items)
$avg_rating      // SUM(rating of all items)/COUNT(all items)
$this_num_votes  // COUNT(votes given for this item)
$this_score      // SUM(rating for this item)
$bayesian_rating // is the formula itself

The formula is: ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes). Taken from here

回答1:

You need to divide by total_votes rather than total_items when calculating avg_rating.

I made the changes and got something that behaves much better here.

http://codepad.org/gSdrUhZ2