How would I implement a ranking algorithm in my we

2020-07-18 07:55发布

问题:

I want to implement a ranking system on a website I've been working on and have decided to go with the Hacker News algorithm. My reasoning for choosing this algorithm is simply because it's been described here.

I was looking at this Python code (the language I'm using to build my site) and couldn't figure out how I would implement it.

def calculate_score(votes, item_hour_age, gravity=1.8):
    return (votes - 1) / pow((item_hour_age+2), gravity)

Given the tables:

posts:
    id | title | time_submitted

votes:
    id | postid | userid | score

How would I pull the data from the database? The ideal solution (most efficient) would be to construct a MySQL query to retrieve the top 10 posts ranked using the algorithm. But given that Hacker News has it implemented in Arc, it makes me think they pull out all the posts then run them through the algorithm to rank them.

Reddit also comes to mind for this... They use a non-relational database schema so I would assume they, like Hacker News, perform the rankings in their code - not the database.

How would you implement this?

EDIT: one post can have many votes as I would like to log which user votes on which post.

回答1:

You can use the data you need in the ORDER BY clause.

SELECT p.id, p.title, p.time_submitted, SUM(v.score) as num_votes 
  FROM posts p, votes v
 WHERE v.postid = p.id
GROUP BY p.id
ORDER BY 
   (SUM(v.score) - 1) / POW(TIMESTAMPDIFF(HOUR,p.time_submitted,NOW()) + INTERVAL 2 HOUR, 1.8) DESC
LIMIT 100


回答2:

In your case, the number of votes would be returned by:

SELECT count(*) FROM votes WHERE postid=<THE POST'S ID>;

If you want to consider score, you could include that in the query but the formula you provided is not equipped to handle it.

The item hour age is simply the current time subtracted from the time submitted:

SELECT HOUR(TIMEDIFF(NOW(), time_submitted)) FROM posts WHERE id=<THE POST'S ID>;

This can also be done entirely in SQL:

SELECT id FROM posts ORDER BY (((SELECT count(*) FROM votes WHERE postid=posts.id) - 1) / MOD(HOUR(TIMEDIFF(NOW(), time_submitted) + INTERVAL 2 HOURS), <GRAVITY>)) LIMIT 10;