Best way to update user rankings without killing t

2019-03-31 05:29发布

问题:

I have a website that has user ranking as a central part, but the user count has grown to over 50,000 and it is putting a strain on the server to loop through all of those to update the rank every 5 minutes. Is there a better method that can be used to easily update the ranks at least every 5 minutes? It doesn't have to be with php, it could be something that is run like a perl script or something if something like that would be able to do the job better (though I'm not sure why that would be, just leaving my options open here).

This is what I currently do to update ranks:

$get_users = mysql_query("SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC");
$i=0;
while ($a = mysql_fetch_array($get_users)) {
    $i++;
    mysql_query("UPDATE users SET month_rank = '$i' WHERE id = '$a[id]'");
}

UPDATE (solution):

Here is the solution code, which takes less than 1/2 of a second to execute and update all 50,000 rows (make rank the primary key as suggested by Tom Haigh).

mysql_query("TRUNCATE TABLE userRanks");
mysql_query("INSERT INTO userRanks (userid) SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC");
mysql_query("UPDATE users, userRanks SET users.month_rank = userRanks.rank WHERE users.id = userRanks.id");

回答1:

Make userRanks.rank an autoincrementing primary key. If you then insert userids into userRanks in descending rank order it will increment the rank column on every row. This should be extremely fast.

TRUNCATE TABLE userRanks;
INSERT INTO userRanks (userid) SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC;
UPDATE users, userRanks SET users.month_rank = userRanks.rank WHERE users.id = userRanks.id;


回答2:

My first question would be: why are you doing this polling-type operation every five minutes?

Surely rank changes will be in response to some event and you can localize the changes to a few rows in the database at the time when that event occurs. I'm pretty certain the entire user base of 50,000 doesn't change rankings every five minutes.

I'm assuming the "status = '1'" indicates that a user's rank has changed so, rather than setting this when the user triggers a rank change, why don't you calculate the rank at that time?

That would seem to be a better solution as the cost of re-ranking would be amortized over all the operations.

Now I may have misunderstood what you meant by ranking in which case feel free to set me straight.



回答3:

A simple alternative for bulk update might be something like:

set @rnk = 0;
update users 
set month_rank = (@rnk := @rnk + 1)
order by month_score DESC

This code uses a local variable (@rnk) that is incremented on each update. Because the update is done over the ordered list of rows, the month_rank column will be set to the incremented value for each row.



回答4:

Updating the users table row by row will be a time consuming task. It would be better if you could re-organise your query so that row by row updates are not required.

I'm not 100% sure of the syntax (as I've never used MySQL before) but here's a sample of the syntax used in MS SQL Server 2000

DECLARE @tmp TABLE
(
    [MonthRank] [INT] NOT NULL,
    [UserId] [INT] NOT NULL,
)

INSERT INTO @tmp ([UserId])
SELECT [id] 
FROM [users] 
WHERE [status] = '1' 
ORDER BY [month_score] DESC

UPDATE users 
SET month_rank = [tmp].[MonthRank]
FROM @tmp AS [tmp], [users]
WHERE [users].[Id] = [tmp].[UserId]

In MS SQL Server 2005/2008 you would probably use a CTE.



回答5:

Any time you have a loop of any significant size that executes queries inside, you've got a very likely antipattern. We could look at the schema and processing requirement with more info, and see if we can do the whole job without a loop.

How much time does it spend calculating the scores, compared with assigning the rankings?



回答6:

Your problem can be handled in a number of ways. Honestly more details from your server may point you in a totally different direction. But doing it that way you are causing 50,000 little locks on a heavily read table. You might get better performance with a staging table and then some sort of transition. Inserts into a table no one is reading from are probably going to be better.

Consider

mysql_query("delete from month_rank_staging;");
while(bla){
  mysql_query("insert into month_rank_staging values ('$id', '$i');");
}
mysql_query("update month_rank_staging src, users set users.month_rank=src.month_rank where src.id=users.id;");

That'll cause one (bigger) lock on the table, but might improve your situation. But again, that may be way off base depending on the true source of your performance problem. You should probably look deeper at your logs, mysql config, database connections, etc.



回答7:

Possibly you could use shards by time or other category. But read this carefully before...



回答8:

You can split up the rank processing and the updating execution. So, run through all the data and process the query. Add each update statement to a cache. When the processing is complete, run the updates. You should have the WHERE portion of the UPDATE reference a primary key set to auto_increment, as mentioned in other posts. This will prevent the updates from interfering with the performance of the processing. It will also prevent users later in the processing queue from wrongfully taking advantage of the values from the users who were processed before them (if one user's rank affects that of another). It also prevents the database from clearing out its table caches from the SELECTS your processing code does.