This question already has answers here:
Closed 2 years ago.
Suppose we have the following table, where each row represents a submission a user made during a programming contest, id
is an auto-increment primary key, probid
identifies the problem the submission was made to, score
is the number of points the submission earned for the problem, and date
is the timestamp when the submission was made. Each user can submit as many times as they want to the same problem:
+----+----------+--------+-------+------------+
| id | username | probid | score | date |
+----+----------+--------+-------+------------+
| 1 | brian | 1 | 5 | 1542766686 |
| 2 | alex | 1 | 10 | 1542766686 |
| 3 | alex | 2 | 5 | 1542766901 |
| 4 | brian | 1 | 10 | 1542766944 |
| 5 | jacob | 2 | 10 | 1542766983 |
| 6 | jacob | 1 | 10 | 1542767053 |
| 7 | brian | 2 | 8 | 1542767271 |
| 8 | jacob | 2 | 10 | 1542767456 |
| 9 | brian | 2 | 7 | 1542767522 |
+----+----------+--------+-------+------------+
In order to rank the contestants, we need to determine the best submission each user made to each problem. The "best" submission is the one with the highest score, with ties broken by submission ID (i.e., if the user got the same score on the same problem twice, we only care about the earlier of the two submissions). This would yield a table like the following:
+----------+--------+----+-------+------------+
| username | probid | id | score | date |
+----------+--------+----+-------+------------+
| alex | 1 | 2 | 10 | 1542766686 |
| alex | 2 | 3 | 5 | 1542766901 |
| brian | 1 | 4 | 10 | 1542766944 |
| brian | 2 | 7 | 8 | 1542767271 |
| jacob | 1 | 6 | 10 | 1542767053 |
| jacob | 2 | 5 | 10 | 1542766983 |
+----------+--------+----+-------+------------+
How can I write a query to accomplish this?
SELECT username , probid , id , score , `date`
FROM tableName
ORDER BY username, score DESC, ID
Using MySQL-8.0 or MariaDB-10.2 or later:
SELECT username, probid, id, score, `date`
FROM (
SELECT username, probid, id, score, `date`,
ROW_NUMBER() over (
PARTITION BY username,probid
ORDER BY score DESC) as `rank`
FROM tablename
) as tmp
WHERE tmp.`rank` = 1
This query will work on versions of MySQL prior to 8.0 as well. The LEFT JOIN
removes duplicate scores, ensuring that equal scores only have the lowest date in the result set for a given score. Then the WHERE
clause ensures that we have the maximum score for a given user/problem combination:
SELECT t1.username, t1.probid, t1.id, t1.score, t1.date
FROM tablename t1
LEFT JOIN tablename t2
ON t2.username = t1.username AND
t2.probid = t1.probid AND
t2.score = t1.score AND
t2.date < t1.date
WHERE t2.id IS NULL AND
t1.score = (SELECT MAX(score) FROM tablename t3 WHERE t3.username = t1.username AND t3.probid = t1.probid)
ORDER BY t1.username, t1.probid
Update
It's almost certainly more efficient to JOIN
the table to a list of maximum scores per user per problem first rather than computing the MAX
value for each row in the result table. This query does that instead:
SELECT t1.username, t1.probid, t1.id, t1.score, t1.date
FROM tablename t1
JOIN (SELECT username, probid, MAX(score) AS score
FROM tablename
GROUP BY username, probid) t2
ON t2.username = t1.username AND
t2.probid = t1.probid AND
t2.score = t1.score
LEFT JOIN tablename t3
ON t3.username = t1.username AND
t3.probid = t1.probid AND
t3.score = t1.score AND
t3.date < t1.date
WHERE t3.id IS NULL
ORDER BY t1.username, t1.probid
Output (for both queries):
username probid id score date
alex 1 2 10 1542766686
alex 2 3 5 1542766901
brian 1 4 10 1542766944
brian 2 7 8 1542767271
jacob 1 6 10 1542767053
jacob 2 5 10 1542766983
Updated Demo on SQLFiddle
In pre-MySQL 8.0.2, we can emulate Row_Number()
functionality using User-defined Variables. In this technique, we firstly get the data in a particular order (depends on the problem statement at hand).
In your case, within a partition of probid
and username
, we need to rank scores in descending order, with the row having lower timestamp value given higher priority (to break the ties). So, we will ORDER BY probid, username, score DESC, date ASC
.
Now, we can use this result-set as a Derived Table, and determine the row number. It will be like a Looping technique (which we use in application code, eg: PHP). We would store the previous row values in the User-defined variables, and use conditional CASE .. WHEN
expressions to check the current row's value(s) against the previous row. And, then assign row number accordingly.
Eventually, we will consider only those rows where row number is 1, and (if required), sort it by username
and probid
.
Query
SELECT dt2.username,
dt2.probid,
dt2.id,
dt2.score,
dt2.date
FROM (SELECT @rn := CASE
WHEN @un = dt1.username
AND @pid = dt1.probid THEN @rn + 1
ELSE 1
end AS row_no,
@un := dt1.username AS username,
@pid := dt1.probid AS probid,
dt1.id,
dt1.score,
dt1.date
FROM (SELECT id,
username,
probid,
score,
date
FROM your_table
ORDER BY username,
probid,
score DESC,
date ASC) AS dt1
CROSS JOIN (SELECT @un := '',
@pid := 0,
@rn := 0) AS user_init_vars) AS dt2
WHERE dt2.row_no = 1
ORDER BY dt2.username, dt2.probid;
Result
| username | probid | id | score | date |
| -------- | ------ | --- | ----- | ---------- |
| alex | 1 | 2 | 10 | 1542766686 |
| alex | 2 | 3 | 5 | 1542766901 |
| brian | 1 | 4 | 10 | 1542766944 |
| brian | 2 | 7 | 8 | 1542767271 |
| jacob | 1 | 6 | 10 | 1542767053 |
| jacob | 2 | 5 | 10 | 1542766983 |
View on DB Fiddle