-->

Join a count query on a generate_series in postgre

2019-02-21 16:18发布

问题:

What I want to get is a statistic with each month from a generate_series and the sum of the counted id's in every month. This SQL works in PostgreSQL 9.1:

  SELECT (to_char(serie,'yyyy-mm')) AS year, sum(amount)::int AS eintraege FROM (
    SELECT  
       COUNT(mytable.id) as amount,   
       generate_series::date as serie   
       FROM mytable  

    RIGHT JOIN generate_series(  

       (SELECT min(date_from) FROM mytable)::date,   
       (SELECT max(date_from) FROM mytable)::date,  
       interval '1 day') ON generate_series = date(date_from)  
       WHERE version = 1   
       GROUP BY generate_series       
       ) AS foo  
  GROUP BY Year   
  ORDER BY Year ASC;  

And this is my output

"2006-12" | 4  
"2007-02" | 1  
"2007-03" | 1  

But what I want to get is this output ("0" value in January):

"2006-12" | 4  
"2007-01" | 0  
"2007-02" | 1  
"2007-03" | 1  

So if there is a month with no id it should be listed nevertheless. Any ideas how to solve this?

Here is some sample data:

drop table if exists mytable;
create table mytable(id bigint, version smallint, date_from timestamp without time zone);
insert into mytable(id, version, date_from) values

('4084036', '1', '2006-12-22 22:46:35'),
('4084938', '1', '2006-12-23 16:19:13'),
('4084938', '2', '2006-12-23 16:20:23'),
('4084939', '1', '2006-12-23 16:29:14'),
('4084954', '1', '2006-12-23 16:28:28'),
('4250653', '1', '2007-02-12 21:58:53'),
('4250657', '1', '2007-03-12 21:58:53')
;

回答1:

Untangled, simplified and fixed, it might look like this:

SELECT to_char(s.tag,'yyyy-mm') AS monat
     , count(t.id) AS eintraege
FROM  (
   SELECT generate_series(min(date_from)::date
                        , max(date_from)::date
                        , interval '1 day'
          )::date AS tag
   FROM   mytable t
   ) s
LEFT   JOIN mytable t ON t.date_from::date = s.tag AND t.version = 1   
GROUP  BY 1
ORDER  BY 1;

db<>fiddle here

Among all the noise, misleading identifiers and unconventional format the actual problem was hidden here:

WHERE version = 1

While you made correct use of RIGHT [OUTER] JOIN, you voided the effort by adding a WHERE clause that requires a distinct value from mytable- converting the RIGHT JOIN to a JOIN effectively.

Pull the clause down into the JOIN condition to make this work.

I simplified some other things.

Related:

  • Generating time series between two dates in PostgreSQL