MySQL group by and max returns wrong rows

2019-06-16 18:24发布

问题:

I have two tables and I try to find the "post" with the highest score per day.

CREATE TABLE IF NOT EXISTS `posts_points` (
  `post_id` int(10) unsigned NOT NULL,
  `comments` smallint(5) unsigned NOT NULL,
  `likes` smallint(5) unsigned NOT NULL,
  `favorites` smallint(5) unsigned NOT NULL,
   PRIMARY KEY (`post_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;


CREATE TABLE IF NOT EXISTS `posts` (
  `profile_id` int(10) unsigned NOT NULL,
  `post_id` int(10) unsigned NOT NULL,
  `pubdate_utc` datetime NOT NULL,
  PRIMARY KEY (`post_id`),
  KEY `profile_id` (`profile_id`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

I have tried the query below. It returns the correct score but the other columns are just random rows. What am I doing wrong ?

SELECT p.post_id, p.profile_id
   , MAX(t1.score)
   , DATE_FORMAT(t1.pubdate_utc, '%d %b') post_date
   , DATE(t1.pubdate_utc) mydate
FROM
(
   SELECT p.profile_id, p.post_id, p.pubdate_utc
      , (pp.comments + pp.likes + pp.favorites) AS score
   FROM posts p 
   INNER JOIN posts_points pp ON p.post_id = pp.post_id
) t1
INNER JOIN posts p ON t1.post_id = p.post_id
   AND t1.pubdate_utc = p.pubdate_utc
GROUP BY mydate
ORDER BY mydate DESC
LIMIT 18;

回答1:

I run into this problem all the time. When MySQL runs an aggregate function, for any non-aggregated columns, it simply pulls the first data it runs across for that group, whether it is from the MAX row or not. So what you have to do is order the data in an inner query such that the maxes are first in their groups. See if this works for you:

SELECT t.post_id,
       t.profile_id,
       t.score,
       t.pubdate_utc
FROM (SELECT p.profile_id,
             p.post_id,
             p.pubdate_utc,
             (pp.comments + pp.likes + pp.favorites) score
      FROM posts p
      JOIN posts_points pp ON p.post_id = pp.post_id
      WHERE p.pubdate_utc >= DATE_ADD(DATE(NOW()), INTERVAL -17 DAY)
      ORDER BY score DESC
     ) t
GROUP BY DATE(t.pubdate_utc) DESC
;

Notice that I use no MAX function here. Ordering by score descending and then grouping by date in the outer query will pull up the highest score by date. Also notice that I put the WHERE clause in the inner query. Inner queries like this (tho sometimes necessary) are not very efficient, since they have no indexes for the outer query to optimize on, so make sure your inner result set is as small as it can be. Lastly, notice the GROUP BY DATE(t.pubdate_utc). If I did not reduce it down to just the date information, there would be a lot more than 18 results, as times are also counted then.

Edit: Changed to INTERVAL -17 DAY to give up to 18 results instead of 19.



回答2:

Column1  Column2
C        d
A        any thing
D        y
B        z  

If you order this data by Column1 then it looks like this .....orderby just orders the first column....

Column1  Column2
A        any thing
B        z            
C        d
D        y


回答3:

is a little tricky to understand what you want to do.

The words (columns (posts, comments, favorites) and PK), I understood that you update the values ​​increasing, and does not record each vote.

This select returns the data from the post, and the score, by ordering bigger point, limited to 18.

        SELECT P.post_id,
               P.profile_id,
               (PP.comments + PP.likes + PP.favorites) AS score,
               DATE_FORMAT (P.pubdate_utc, '%d %b') AS post_data,
               DATE (P.pubdate_utc) AS mydate
          FROM posts P
    INNER JOIN posts_points PP
            ON (= P.post_id PP.post_id)
      ORDER BY 3 DESC
         LIMIT 18;

If you want to select the most votes of the day, you must record the likes / favs different, needs data in that table (posts_points).



回答4:

Wow! Tricky. For example, there is always the possibility of ties for max.

The solution below creates an intermediate list of day's max_scores, then gets all posts whose scores are equal to the max for their day. It returns ties, so you may get two rows for a given day. I beg your forgiveness that I can't test this, so give feedback, and I'm sure we can get this to do what you need.

SELECT p.profile_id, p.post_id, p.pubdate_utc
, DATE_FORMAT(p.pubdate_utc, '%d %b') AS post_date
, DATE(p.pubdate_utc) AS mydate
, (pp.comments + pp.likes + pp.favorites) AS score
FROM posts p 
INNER JOIN posts_points pp ON p.post_id = pp.post_id
INNER JOIN 
(
    SELECT p.pubdate_utc AS max_date, 
    (pp.comments + pp.likes + pp.favorites) AS max_score
    FROM posts p2 
    INNER JOIN posts_points pp2 ON p2.post_id = pp2.post_id
) m ON score = m.max_score
AND mydate = m.max_date
ORDER BY mydate DESC
LIMIT 18;


回答5:

you can see this query .Inner query is firstly fetch those rows which have same post_id in both table than sum (pp.comments + pp.likes + pp.favorites) as score .Outer Query is fetch max score and doing group by on the date wise....

SELECT post_id, profile_id
   , MAX(score)
   , DATE_FORMAT(pubdate_utc, '%d %b') post_date
   , DATE(pubdate_utc) as mydate
FROM
(
   SELECT p.profile_id, p.post_id, p.pubdate_utc
      , (pp.comments + pp.likes + pp.favorites) AS score
   FROM posts p 
   INNER JOIN posts_points pp ON p.post_id = pp.post_id
) 
GROUP BY pubdate_utc
ORDER BY pubdate_utc DESC