I have a table with the following structure:
Contents (
id
name
desc
tdate
categoryid
...
)
I need to do some statistics with the data in this table. For example I want to get number of rows with the same category by grouping and id of that category. Also I want to limit them for n
rows in descending order and if there are more categories available I want to mark them as "Others". So far I have come out with 2 queries to database:
Select n
rows in descending order:
SELECT COALESCE(ca.NAME, 'Unknown') AS label
,ca.id AS catid
,COUNT(c.id) AS data
FROM contents c
LEFT OUTER JOIN category ca ON ca.id = c.categoryid
GROUP BY label
,catid
ORDER BY data DESC LIMIT 7
Select other rows as one:
SELECT 'Others' AS label
,COUNT(c.id) AS data
FROM contents c
LEFT OUTER JOIN category ca ON ca.id = c.categoryid
WHERE c.categoryid NOT IN ($INCONDITION)
But when I have no category groups left in db table I still get an "Others" record. Is it possible to make it in one query and make the "Others" record optional?
The quick fix, to make the
'Others'
row conditional would be to add a simpleHAVING
clause to that query.(If there are no other rows in the
contents
table, thenCOUNT(c.id)
is going to be zero.)That only answers half the question, how to make the return of that row conditional.
The second half of the question is a little more involved.
To get the whole resultset in one query, you could do something like this
(this is not tested yet; desk checked only.. I'm not sure if postgresql accepts a LIMIT clause in an inline view... if it doesn't we'd need to implement a different mechanism to limit the number of rows returned.
The inline view
t
basically gets the same result as the first query, a list of (up to) 7id
values from category table, or 6id
values from category table and a NULL.The outer query basically does the same thing, joining
content
withcategory
, but also doing a check if there's a matching row fromt
. Becauset
might be returning a NULL, we have a slightly more complicated comparison, where we want a NULL value to match a NULL value. (MySQL conveniently gives us shorthand operator for this, the null-safe comparison operator<=>
, but I don't think that's available in postgresql, so we have to express differently.The next bit is getting a GROUP BY to work, we want to group by the 7 values returned by the inline view
t
, or, if there's not matching value fromt
, group the "other" rows together. We can get that to happen by using a boolean expression in the GROUP BY clause.We're basically saying "group by 'if there was a matching row from t'" (true or false) and then group by the row from 't'. Get a count, and then order by the count descending.
This isn't tested, only desk checked.
You can approach this with nested aggregation. The inner aggregation calculates the counts along with a sequential number. You want to take everything whose number is 7 or less and then combine everything else into the
others
category:The specific difficulty here: Queries with one or more aggregate functions in the
SELECT
list and noGROUP BY
clause produce exactly one row, even if no row is found in the underlying table.There is nothing you can do in the
WHERE
clause to suppress that row. You have to exclude such a row after the fact, i.e. in theHAVING
clause, or in an outer query.Per documentation:
It should be noted that adding a
GROUP BY
clause with only a constant expression (which is otherwise completely pointless!) works, too. See example below. But I'd rather not use that trick, even if it's short, cheap and simple, because it's hardly obvious what it does.The following query only needs a single table scan and returns the top 7 categories ordered by count. If (and only if) there are more categories, the rest is summarized into 'Others':
You need to break ties if multiple categories can have the same count across the 7th / 8th rank. In my example, categories with the smaller
categoryid
win such a race.Parentheses are required to include a
LIMIT
orORDER BY
clause to an individual leg of aUNION
query.You only need to join to table
category
for the top 7 categories. And it's generally cheaper to aggregate first and join later in this scenario. So don't join in the the base query in the CTE (common table expression) namedcte
, only join in the firstSELECT
of theUNION
query, that's cheaper.Not sure why you need the
COALESCE
. If you have a foreign key in place fromcontents.categoryid
tocategory.id
and bothcontents.categoryid
andcategory.name
are definedNOT NULL
(like they probably should be), then you don't need it.The odd
GROUP BY true
This would work, too:
And I even get slightly faster query plans. But it's a rather odd hack ...
SQL Fiddle demonstrating all.
Related answer with more explanation for the
UNION ALL
/LIMIT
technique: