Algorithm for Finding Good, Reliable Players

2020-06-06 01:30发布

I've the following players, each value corresponds to a result in percentage of right answers in a given game.

$players = array
(
    'A' => array(0, 0, 0, 0),
    'B' => array(50, 50, 0, 0),
    'C' => array(50, 50, 50, 50),
    'D' => array(75, 90, 100, 25),
    'E' => array(50, 50, 50, 50),
    'F' => array(100, 100, 0, 0),
    'G' => array(100, 100, 100, 100),
);

I want to be able to pick up the best players but I also want to take into account how reliable a player is (less entropy = more reliable), so far I've come up with the following formula:

average - standard_deviation / 2

However I'm not sure if this is a optimal formula and I would like to hear your thoughts on this. I've been thinking some more on this problem and I've come up with a slightly different formula, here it is the revised version:

average - standard_deviation / # of bets

This result would then be weighted for the next upcoming vote, so for instance a new bet from player C would only count as half a bet.

I can't go into specifics here but this is a project related with the Wisdom of Crowds theory and the Delphi method and my goal is to predict as best as possible the next results weighting past bets from several players.

I appreciate all input, thanks.

8条回答
甜甜的少女心
2楼-- · 2020-06-06 01:38

Well, the "simple extension" is just the addition of a weight and a bounds:

average(player) - min(upper, weight * entrophy(player))

However, given the current data-set, I might not be concerned with "right answer percentage" so much as looking at the score difference per game, if that is an option.

查看更多
一纸荒年 Trace。
3楼-- · 2020-06-06 01:39

I think you may be right that you want some sort of linear combination of the two factors, but I think we'd need to know more about what your doing to know what the actual constants would be...

查看更多
别忘想泡老子
4楼-- · 2020-06-06 01:41

First off, I would not use Standard Deviation if your data arrays have only a few entries. Use more robust statistical measures like Median Absolute Deviation (MAD), likewise you might want to test using the Median instead of the Average.

This is due to the fact that, if your "knowledge" of players' bets is limited to only a few samples, your data is going to be dominated by outliers, i.e. the player being lucky/unlucky. Statistical means may be entirely inappropriate under those circumstances and you may want to use some form of heuristic approach.

I also assume from your links, that you do not in fact intend to pick the best player but rather based on the players next set of answers "A" want to predict the correct set of answers "C" by weighing "A" based on the players' previous track record.

Of course if there were a good solution to this problem, you could make a killing on the stock market ;-) (The fact that no-one does, should be an indication as to the existence of such a solution).

But getting back to ranking the players. Your main problem is that you (have to?) take the percentage of right answers as evenly distributed from 0--100%. If the test contains multiple questions this is certainly not the case. I would look at what a completely random player "R" scores on the test and build up a relative confidence number based on how much better/worse than "R" a given real player is.

Say, for each round of the game generate a million random players and look at the distribution of scores. Use the distribution as a weight for the players' real scores. Then combine the weighted scores using MAD and calculate the Median - MAD / some number, like you already suggested.

查看更多
Animai°情兽
5楼-- · 2020-06-06 01:41

You can't get an optimal formula if you haven't quantified what is better. You need to figure out how do you want to weigh consistency against average. For example one option would be to estimate the score that the player will hit a given percentage of games. This requires some kind of model of the probability distribution of the players score. For instance, if we assume that the players scores follow the normal distribution, then your given formula calculates what score the player will surpass about 70% of the time.

查看更多
The star\"
6楼-- · 2020-06-06 01:45

Would a Bayesian Probablity Formula fit the bill?

I think it would. Here is a link to another site that is a little less mathematical about it: http://www.experiment-resources.com/bayesian-probability.html

Essentially you are predicting the probability that each player will score the highest in the next round. This is what bayesian probabilities eat for breakfast.

Bayesian probabilities are already in use in video games (warning: .doc file) to determine stuff just like this.

查看更多
够拽才男人
7楼-- · 2020-06-06 01:50

Hm. This would make a (100,100,100,60) player being rated worse than a (85,85,85,85) player. Why not also take the % of total points into account?

Like: percentage total points (e.g. 0..1) multiplied by your current calculation.

查看更多
登录 后发表回答