Calculate the sum of a field filtered by a window

2019-08-29 23:12发布

I have table event:

event_date,
num_events,
site_id

I can easily use aggregate SQL to do SELECT SUM(num_events) GROUP BY site_id.

But I also have another table site:

site_id,
target_date

I'd like to do a JOIN, showing the SUM of num_events within 60 days of the target_date, 90 days, 120 days, etc. I thought this could easily be done using a WHERE clause in the aggregate SQL. However, this is complicated by two challenges:

  1. The target_date is not fixed, but varies for each site_id
  2. I'd like multiple date ranges to be output in the same table; so I can't do a simple WHERE to exclude records falling outside the range from the event table.

One workaround I've thought of is to simply make several queries, one for each date range, and then use a view to paste them together. Is there a simpler, better, or more elegant way to achieve my goals?

2条回答
狗以群分
2楼-- · 2019-08-29 23:33

You would do something like:

select sum(case when target_date - event_date < 30 then 1 else 0 end) as within_030,
       sum(case when target_date - event_date < 60 then 1 else 0 end) as within_060,
       sum(case when target_date - event_date < 90 then 1 else 0 end) as within_090    
from event e join
     site s
     on e.site_id = s.site_id;

That is, you can use conditional aggregation. I am not sure what "within 60" days means. This gives days before the target date, but similar logic will work for what you need.

查看更多
相关推荐>>
3楼-- · 2019-08-29 23:58

In Postgres 9.4 use the new aggregate FILTER clause:

Assuming actual date data type, so we can simply add / subtracts integer numbers for days.
Interpreting "within n days" as "+/- n days":

SELECT site_id, s.target_date
     , sum(e.num_events) FILTER (WHERE e.event_date BETWEEN s.target_date - 30
                                             AND s.target_date + 30) AS sum_30
     , sum(e.num_events) FILTER (WHERE e.event_date BETWEEN s.target_date - 60
                                             AND s.target_date + 60) AS sum_60
     , sum(e.num_events) FILTER (WHERE e.event_date BETWEEN s.target_date - 90
                                             AND s.target_date + 90) AS sum_90
FROM   site  s
JOIN   event e USING (site_id)
WHERE   e.event_date BETWEEN s.target_date - 90
                         AND s.target_date + 90
GROUP  BY 1, 2;

Also add the condition as WHERE clause to exclude irrelevant rows early. This should be substantially faster with more than a trivial number of rows outside the scope of sum_90 in event.

查看更多
登录 后发表回答