Users of my application (it's a game actually) answer questions to get points. Questions are supplied by other users. Due to volume, I cannot check everything myself, so I decided to crowd-source the filtering process to the users (players). The rules are simple:
- each user is shown a question to rate as good/bad/unsure
- when question is rated 5 times as "bad" it is removed from the pool
- when question is rated 5 times as "good" it is removed from the poll and flagged to be played by other players who have not seen it
If everyone could see everything, this would be easy. However, later in the game phase, users shouldn't get questions they have already seen. This means that users should not see all the questions, and exactly those they don't see would they get to play (answer) later in the game.
Total number of questions is much larger than number of players, new questions are added daily and new players come all the time, so I cannot just distribute in advance.
I'm looking for some algorithm that would maximize the number of rated playable (i.e. unseen) questions for all players.
I tried to google, but I'm not even sure which terms to put in the search box, and using stuff like "distribution", "voting", "collaborative filtering" gives very interesting but unusable results.
Ratio of good vs bad questions is 1:3, ie. 25% of questions are rated good. Number of already submitted unrated questions is over 10000. Number of active users with privilege to vote is around 150.
I'm currently considering splitting the question pool and user base into 2 parts. One part of the user base would check the question for the other part and vice versa. Splitting the questions is easy (odd vs even for example). However, I'm still not sure how to divide the user base. I thought about using odd/even position in "top question checkers" list, however the positions on list changes daily as new questions are checked.
Update: I just asked a sequel to this question - I need to periodically remove a fixed number of questions from the pool.
I'm unaware if there is a specific, well known algorithm for this. However this would be my line of thinking:
- "maximize the number of rated playable (i.e. unseen) questions for all players" means both maximising the number of questions with +5 and the number of not-seen questions from each player.
- Whatever the algorithm will be, its effectiveness is tied to both the quality of the questions submitted by the contributors and the willingness to rate by other players (unless you force them to rate questions).
- The goal of your system should not be that of making all players to have the same amount of "unseen questions" [this is in fact irrelevant], but rather that of always having for each player a "reserve" of unseen questions that allows him/her to play at its normal gamerate. For example: say you have two users A and B who play regularly on your site. A normally answers 80 quizzes per day, while B only 40. If your system in average get 100 new approved questions daily, in principle you would like player A to never see more than 20 of those every day, while player B could safely see 60 of them.
- The ratio between submitted question and approved question is also important: if every second submitted question is not good, then users A and B from above could rate 40 and 120 questions daily.
So my approach to the final algorithm would be:
- Keep track of the following:
- Number of submitted new question per day (F = Flow)
- Ratio between good/total submitted questions (Q = quality)
- Number of questions used (for playing, not for rating) by each player per day (GR = Game Rate)
- Number of questions rated by each player on a given day (RC = Review Counter)
- Establish a priority queue of questions to be rated. The goal here is to have approved questions as fast as possible. Give a bonus priority to both:
- questions that have collected upvotes already
- questions submitted by users who have a history of other questions having already been accepted.
- When a player is involved in rating, show him/her the first question in the queue.
- Repeat step 3 as much as you want making sure this condition is never met: Q * (F - RC) < GR
[The above can be further tweaked, considering the fact that when the user first register, there will be already a pool of approved but unseen questions on the site]
Of course you can heavily influence the behaviour of users by giving incentives for meritorious activity (badges and reputation points on SO are a self-explanatory example).
EDIT/ADDENDUM: The discussion in the comments clarify that the GR is fixed, and it is one question per day. Furthermore, the OP states that there will be at least 1 new approved question in the system every 24 hours. This means that it is possible to simplify the above algorithm in one of the two forms:
If the user can vote only AFTER he answered his daily question:
If there is at least one approved, unseen question in the system, let the user vote at will.
If the user can vote even BEFORE answering his daily question:
If there are at least two approved, unseen questions in the system, let the user vote at will.
This is such that if a user is voting all votable questions on the system and then answers his daily one at 23:59, there will still be a question available to be answered at 00:00, plus 24h time for the system to acquire a new question for the following day.
HTH!