SQL query to return a grouped result as a single r

2019-03-02 01:57发布

If I have a jobs table like:

|id|created_at  |status    |
----------------------------
|1 |01-01-2015  |error     |
|2 |01-01-2015  |complete  |
|3 |01-01-2015  |error     |
|4 |01-02-2015  |complete  |
|5 |01-02-2015  |complete  |
|6 |01-03-2015  |error     |
|7 |01-03-2015  |on hold   |
|8 |01-03-2015  |complete  |

I want a query that will group them by date and count the occurrence of each status and the total status for that date.

SELECT created_at status, count(status), created_at 
FROM jobs 
GROUP BY created_at, status;

Which gives me

|created_at  |status    |count|
-------------------------------
|01-01-2015  |error     |2
|01-01-2015  |complete  |1
|01-02-2015  |complete  |2
|01-03-2015  |error     |1
|01-03-2015  |on hold   |1
|01-03-2015  |complete  |1   

I would like to now condense this down to a single row per created_at unique date with some sort of multi column layout for each status. One constraint is that status is any one of 5 possible words but each date might not have one of every status. Also I would like a total of all statuses for each day. So desired results would look like:

|date        |total |errors|completed|on_hold|
----------------------------------------------
|01-01-2015  |3     |2     |1        |null   
|01-02-2015  |2     |null  |2        |null
|01-03-2015  |3     |1     |1        |1

the columns could be built dynamically from something like

SELECT DISTINCT status FROM jobs;

with a null result for any day that doesn't contain any of that type of status. I am no SQL expert but am trying to do this in a DB view so that I don't have to bog down doing multiple queries in Rails.

I am using Postresql but would like to try to keep it straight SQL. I have tried to understand aggregate function enough to use some other tools but not succeeding.

2条回答
贼婆χ
2楼-- · 2019-03-02 02:42

The following should work in any RDBMS:

SELECT created_at, count(status) AS total,
       sum(case when status = 'error' then 1 end) as errors,
       sum(case when status = 'complete' then 1 end) as completed,
       sum(case when status = 'on hold' then 1 end) as on_hold
FROM jobs 
GROUP BY created_at;

The query uses conditional aggregation so as to pivot grouped data. It assumes that status values are known before-hand. If you have additional cases of status values, just add the corresponding sum(case ... expression.

Demo here

查看更多
劳资没心,怎么记你
3楼-- · 2019-03-02 02:45

An actual crosstab query would look like this:

SELECT * FROM crosstab(
   $$SELECT created_at, status, count(*) AS ct
     FROM   jobs 
     GROUP  BY 1, 2
     ORDER  BY 1, 2$$

  ,$$SELECT unnest('{error,complete,"on hold"}'::text[])$$)
AS ct (date date, errors int, completed int, on_hold int);

Should perform very well.

Basics:

The above does not yet include the total per date.
Postgres 9.5 introduces the ROLLUP clause, which is perfect for the case:

SELECT * FROM crosstab(
 $$SELECT created_at, COALESCE(status, 'total'), ct
   FROM  (
      SELECT created_at, status, count(*) AS ct
      FROM   jobs 
      GROUP  BY created_at, ROLLUP(status)
      ) sub
   ORDER  BY 1, 2$$

  ,$$SELECT unnest('{total,error,complete,"on hold"}'::text[])$$)
AS ct (date date, total int, errors int, completed int, on_hold int);

Up to Postgres 9.4, use this query instead:

WITH cte AS (
    SELECT created_at, status, count(*) AS ct
    FROM   jobs 
    GROUP  BY 1, 2
    )
TABLE  cte
UNION  ALL
SELECT created_at, 'total', sum(ct)
FROM   cte 
GROUP  BY 1
ORDER  BY 1

Related:


If you want to stick to a simple query, this is a bit shorter:

SELECT created_at
     , count(*) AS total
     , count(status = 'error' OR NULL)    AS errors
     , count(status = 'complete' OR NULL) AS completed
     , count(status = 'on hold' OR NULL)  AS on_hold
FROM   jobs 
GROUP  BY 1;

count(status) for the total per date is error-prone, because it would not count rows with NULL values in status. Use count(*) instead, which is also shorter and a bit faster.

Here is a list of techniques:

In Postgres 9.4+ use the new aggregate FILTER clause, like @a_horse mentioned:

SELECT created_at
     , count(*) AS total
     , count(*) FILTER (WHERE status = 'error')    AS errors
     , count(*) FILTER (WHERE status = 'complete') AS completed
     , count(*) FILTER (WHERE status = 'on hold')  AS on_hold
FROM   jobs 
GROUP  BY 1;

Details:

查看更多
登录 后发表回答