For common elements, how to find the value based o

2019-09-17 11:06发布

问题:

I am using DB2 by IBM; however I am hoping this can be done using the SQL standard without using the additional functions added by each DB manufacturer.

My Problem:

For every CID (customer ID) I need to find the CLUB from which the most purchases were made.

In the table below, customer #1 ('cid =1') bought the most books from the club 'Readers Digest'. I can get this part done using: Find most frequent value in SQL column

However CID=2 is much harder (for me), and I am stuck. For 'cid = 2' we can see that the most commonly occurring sting under the 'CLUB' column is a tie between 'Oprah' and 'YRB Gold' - but when we look at the 'QNTY' column it becomes apparent (to people) that the final answer for 'cid=2' is 'YRB_gold'. 'YRB_GOLD' sold 5 books, where 'OPRAH' sold only 4 books.

I put 'cid=3' here as well:

There are 47 customers (cid). My question is: how do I step through every 'CID', and consider the most commonly occurring string under 'CLUB', while at the same time also keeping track of the 'QNTY' for every club? If you choose to help, could you please offer the most OEM neutral answer? Thank you in advance.

******************************EDIT #1*****************************************

Thanks for the answer below. Currently I am having an issue making it work.

The table name is 'yrb_purchase'. When I type everything in: SELECT cid, club, qnty AS q1 FROM (SELECT cid, club, sum(qnty) AS q2, row_number() OVER (PARTITION BY cid ORDER BY sum(q2) DESC) AS seqnum FROM yrb_purchase GROUP BY cid, club) cc WHERE seqnum = 1

I get an error: SQL0206N "Q2" is not valid in the context where it is used. SQLSTATE=42703

currently working on trying to figure out how to fix that.

回答1:

The value that you are looking for has a name. Statistically, it is called the "mode".

If I understand correctly, you want the total quantity for each club, and then the one with the highest for each cid. You can use row_number() or rank(), depending on how you want to handle ties:

select cid, club, qty
from (select cid, club, sum(qty) as qty,
             row_number() over (partition by cid order by sum(qty) desc) as seqnum
      from t
      group by cid, club
     ) cc
where seqnum = 1;

If there are ties and you want all values, then use rank() instead of row_number().



标签: sql db2