Average of latest N records per group

2019-01-25 10:42发布

问题:

My current application calculates a point average based on all records for each user:

SELECT `user_id`, AVG(`points`) AS pts 
FROM `players` 
WHERE `points` != 0 
GROUP BY `user_id`

The business requirement has changed and I need to calculate the average based on the last 30 records for each user.

The relevant tables have the following structure:

table: players; columns: player_id, user_id, match_id, points

table: users; columns: user_id

The following query does not work, but it does demonstrate the logic that I am trying to implement.

SELECT @user_id := u.`id`, (
    -- Calculate the average for last 30 records
    SELECT AVG(plr.`points`) 
    FROM (
        -- Select the last 30 records for evaluation
        SELECT p.`points` 
        FROM `players` AS p 
        WHERE p.`user_id`=@user_id 
        ORDER BY `match_id` DESC 
        LIMIT 30
    ) AS plr
) AS avg_points 
FROM `users` AS u

Is there a fairly efficient way to calculate the averages based on the latest 30 records for each user?

回答1:

Try this:

SELECT user_id, AVG(points) AS pts 
FROM (SELECT user_id, IF(@uid = (@uid := user_id), @auto:=@auto + 1, @auto := 1) autoNo, points
      FROM players, (SELECT @uid := 0, @auto:= 1) A 
      WHERE points != 0 
      ORDER BY user_id, match_id DESC
     ) AS A 
WHERE autoNo <= 30
GROUP BY user_id;


回答2:

There is no reason to reinvent a wheel and risk you have a buggy, suboptimal code. Your problem is trivial extension of common per group limit problem. There are already tested and optimized solutions to solve this problem, and from this resource I would recommend to choose from following two solutions. These queries produce latest 30 records for each player (rewritten for your tables):

select user_id, points
from players
where (
   select count(*) from players as p
   where p.user_id = players.user_id and p.player_id >= players.player_id
) <= 30;

(Just to make sure I understand your structure: I suppose player_id is a unique key in players table and that one user can be present in this table as multiple players.)

Second tested and optimized solution is to use MySQL variables:

set @num := 0, @user_id := -1;

select user_id, points,
      @num := if(@user_id = user_id, @num + 1, 1) as row_number,
      @user_id := user_id as dummy
from players force index(user_id) /* optimization */
group by user_id, points, player_id /* player_id should be necessary here */
having row_number <= 30;

The first query will not be as optimial (is quadratic), while the second query is optimal (one-pass), but will only work in MySQL. The choice is up to you. If you go for the second technique, beware and test it properly with your keys and database setting; they suggest in some circumstances it might stop working.

Your final query is trivial:

select user_id, avg(points)
from ( /* here goes one of the above solutions; 
          the "set" commands should go before this big query */ ) as t
group by user_id

Note that I have not incorporated the condition you have in your 1st query (points != 0) as I don't understand your requirement well (you haven't described it), and I also think this answer should be general enough to help others with similar problem.



回答3:

This should work:

SELECT p1.user_id, avg(points) as pts
  FROM players p1, (
    SELECT u.user_id, (
         SELECT match_id
           FROM players p2
          WHERE p2.user_id = u.user_id
          ORDER BY match_id DESC
          LIMIT 29, 1 ) mid
      FROM users u
    HAVING mid IS NOT NULL) m
 WHERE p1.user_id = m.user_id
   AND p1.match_id >= m.mid
 GROUP BY p1.user_id

 UNION ALL

SELECT user_id, avg(points) AS pts 
  FROM players
 GROUP BY user_id
HAVING count(*) < 30

The part after the UNION ALL is only necessary if you need to include users with less than 30 records.



回答4:

If I understand your logic correctly, you need to calculate the score average for every user based on the last 30 records (ordered by match_id) that have non zero points.

First of all, you need to return the last 30 records for each user, and you could use a query like this:

SELECT p.user_id, p.match_id, p.points
FROM
  players p INNER JOIN players c
  ON p.user_id=c.user_id AND p.match_id<=c.match_id
     AND p.points!=0 and c.points!=0
GROUP BY
  p.user_id, match_id, points
HAVING
  COUNT(c.user_id)<=30

Then you need to calculate the average on the previous query:

SELECT user_id, AVG(points)
FROM (
  SELECT p.user_id, p.match_id, p.points
  FROM
    players p INNER JOIN players c
    ON p.user_id=c.user_id AND p.match_id<=c.match_id
       AND p.points!=0 and c.points!=0
  GROUP BY
    p.user_id, match_id, points
  HAVING
    COUNT(c.user_id)<=30
  ) l
GROUP BY user_id


回答5:

SELECT 
u.`id`, 
(SELECT AVG(p.`points`) FROM FROM `players` AS p WHERE p.`user_id`=u.`id` 
ORDER BY p.`user_id` DESC LIMIT 30) AS AVG
FROM `users` AS u Group by u.`id`

and also try this too...