How to write“greatest n per group” type query, but

2019-04-09 01:48发布

问题:

I asked a question about making a "greatest n per group" type query yesterday (at the time not knowing it was called "greatest n per group") except I asked how to get the least per group. The table structure was as follows:

type    name    value
=====================
1       item1    1
1       item2    20
2       item3    0
3       item4    5
3       item5    2
3       item6    50

I received several great answers, and the most helpful one was this:

SELECT t1.type, t1.name, t1.value
FROM mytable t1
LEFT JOIN mytable t2 ON t1.type = t2.type AND t1.value > t2.value
WHERE t2.value IS NULL

The above query results in this:

type    name    value
=====================
2       item3    0
1       item1    1
3       item5    2

However, since asking the question, I realized that I left out an important requirement, one which I can't seem to figure out how to add to the above query. I need to add a conditional statement that, instead of selecting the row with the lowest value for a column per group, selects the row with the lowest value for a column per group but where that row has another column with a value greater than some minimum value.

Here is my new question/problem:


I have the following table (products):

+-----------------------------------------------------------+
|   id   |   type   |   name   |   popularity   |   price   |
+-----------------------------------------------------------+
|    0   |    0     |   item1  |      3.5       |   0.99    |
|    3   |    1     |   item2  |      3         |   1.99    |
|    4   |    1     |   item3  |      6         |   2.95    |
|    6   |    1     |   item4  |      9         |   2.50    |
|    9   |    1     |   item5  |      12        |   3.75    |
|    12  |    2     |   item6  |      16        |   5.25    |
|    13  |    2     |   item7  |      32        |   10.95   |
|    14  |    2     |   item8  |      48        |   7.50    |
+-----------------------------------------------------------+

I need to get the lowest priced item in each group (the groups being the different values of type) with the popularity being greater than some amount (if no items in the group have popularity greater than the specified amount, then no items from that group should be returned). The last part of that sentence is how this question differs from my last question. Preferably the results should be returned sorted by the price in ascending order.

So, say that the popularity needs to be greater than 3, then the result should be:

+-----------------------------------------------------------+
|   id   |   type   |   name   |   popularity   |   price   |
+-----------------------------------------------------------+
|    0   |    0     |   item1  |      3.5       |   0.99    |
|    6   |    1     |   item4  |      9         |   2.50    |
|    12  |    2     |   item6  |      16        |   5.25    |
+-----------------------------------------------------------+

If the popularity needs to be greater than 6, then the result should be:

+-----------------------------------------------------------+
|   id   |   type   |   name   |   popularity   |   price   |
+-----------------------------------------------------------+
|    6   |    1     |   item4  |      9         |   2.50    |
|    12  |    2     |   item6  |      16        |   5.25    |
+-----------------------------------------------------------+

Hopefully I did both of those examples correctly. In any case, I think you get the idea.

Would it be possible to do what I'm asking in one query?

回答1:

This requires some attention where to put the condition (to where or to on?) so that you don't get tricked :-) You need to add the condition for t1 to where clause and for t2 to on clause:

SELECT t1.type, t1.name, t1.value
FROM mytable t1
LEFT JOIN mytable t2 ON t1.type = t2.type AND t1.value > t2.value 
    AND t2.popularity > 3 /* here */
WHERE t2.value IS NULL 
    AND t1.popularity > 3 /* and here */

Haven't tested it, but it should work.

Attempt for an explanation: the condition in where clause affects which elements you consider as potential elements with lowest value. Whereas the condition in on clause affects the linkage: to what other elements you want to compare it? It defines the group within which you compare. Technically, it has an impact on when t2.* will be NULL. Had you given the condition on t2.popularity to where clause instead, you would not receive any NULLs (i.e. not find elements w/lowest value) for groups where the lowest elements have low popularity.