how to get distinct rows with max value

2019-01-24 18:28发布

问题:

my apologies for asking what must be very simple to solve, but I just can't seem to wrap my mind around this.. I couldn't even come up with a really fitting title for my question, beg your pardons for that as well.

I have a poll where each user may post multiple answers to a question and others then vote on these answers. I need to get a result where the highest-voted answer of every user gets returned.

Test case: let's assume a question like "What is your most favourite song quote?"

CREATE TABLE `answers` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`authorId` INT,
`answer` TEXT NOT NULL ,
`votes` INT NOT NULL 
) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO `answers` VALUES
(1, 30, "The West is the Best", 120),
(2, 30, "Come on, baby, light my fire", 100),
(3, 31, "Everything's gonna be allright", 350),
(4, 31, "Sayin' oooh, I love you", 350),
(5, 31, "Singing sweet songs of melodies pure and true", 290),
(6, 32, "I'm your pole and all you're wearing is your shoes", 540),
(7, 32, "And I'm crazier when I'm next to her", 180),
(8, 32, "You hear the music in the air", 230),
(9, 30, "You know they are a liar", 190)

The result I expect to get is:

id | authorId | answer                                             | votes
 6 |       32 | I'm your pole and all you're wearing is your shoes | 540
 3 |       31 | Everything's gonna be allright                     | 350
 9 |       30 | You know they are a liar                           | 190

Basically I need to select the best answer for each author and then sort the result by the votes the best answers got. It may happen that two answers by the same author have the same number of votes; then only the first one posted (lower ID) should be selected (as demonstrated with answers #3 and #4). Two different answers by the same author may never appear in the result - each author may only win once.

I've searched and searched and tried and tried again and at the moment I feel pretty brainwashed.. it may be that this is not doable in a single SQL query; should that be the case, it might be worth pointing out that the application is written in PHP. I know I could just grab all the answers with ORDER BY votes desc, id asc and then iterate over the result, remember all the authorIds and chuck out any row with an authorId I've already seen, but I need to obtain a set number of records and that could get awkward (... would possibly need running the query again with an offset if I chuck out too many rows etc).. but in the end it might be the best solution if the single-query solution was overly complex or none at all...

Any ideas? :o)

回答1:

SELECT id, authorId, answer, votes
FROM (  SELECT id, authorId, answer, votes
        FROM answers
        ORDER BY votes DESC) AS h
GROUP BY authorId

This little neat trick is built basing on GROUP BY to retrieve the first row of each case. Usually this is by default ORDER BY id ASC, but through this sub query, the first row in each authorId with the highest votes.

Note: As mentioned by Iain Elder, this solution doesn't work with ONLY_FULL_GROUP_BY active and only works in MySQL. This solution is to a certain extent unsupported due to lack of documentation confirming this behavior. It works well for me and has always worked well for me however.

This method still works on the latest MySQL on sqlfiddle.



回答2:

You can use a sub-select:

select min(a1.id), a1.authorid, a2.mxvotes
from answers a1
inner join
(
  select authorid, max(votes) mxvotes
  from answers
  group by authorid
) a2
  on a1.authorid = a2.authorid
  and a1.votes = a2.mxvotes
group by a1.authorid, a2.mxvotes
order by mxvotes desc

see SQL Fiddle with Demo



回答3:

Great question, Dan.

MySQL lacks the analytical functions to make this easy to solve. A similar question has been asked of Oracle and was solved using the OVER clause with the MAX function. This works on SQL Server too.

You need to use subqueries to do it on MySQL. This works for me:

SELECT
  id,
  authorId,
  answer,
  votes
FROM answers AS firsts
WHERE id = (
  SELECT
    MIN(id)
  FROM answers AS favorites
  WHERE
    votes = (
      SELECT MAX(votes)
      FROM answers AS author_max
      WHERE author_max.authorId = favorites.authorID
    ) AND
    favorites.authorId = firsts.authorId 
)
ORDER BY votes DESC;

See my sqlfiddle for an executable example.



回答4:

select * from (select * from answers order by votes desc) as temp group by authorId