Calculate the sum of a field filtered by a window

2019-08-29 23:16发布

问题:

I have table event:

event_date,
num_events,
site_id

I can easily use aggregate SQL to do SELECT SUM(num_events) GROUP BY site_id.

But I also have another table site:

site_id,
target_date

I'd like to do a JOIN, showing the SUM of num_events within 60 days of the target_date, 90 days, 120 days, etc. I thought this could easily be done using a WHERE clause in the aggregate SQL. However, this is complicated by two challenges:

  1. The target_date is not fixed, but varies for each site_id
  2. I'd like multiple date ranges to be output in the same table; so I can't do a simple WHERE to exclude records falling outside the range from the event table.

One workaround I've thought of is to simply make several queries, one for each date range, and then use a view to paste them together. Is there a simpler, better, or more elegant way to achieve my goals?

回答1:

You would do something like:

select sum(case when target_date - event_date < 30 then 1 else 0 end) as within_030,
       sum(case when target_date - event_date < 60 then 1 else 0 end) as within_060,
       sum(case when target_date - event_date < 90 then 1 else 0 end) as within_090    
from event e join
     site s
     on e.site_id = s.site_id;

That is, you can use conditional aggregation. I am not sure what "within 60" days means. This gives days before the target date, but similar logic will work for what you need.



回答2:

In Postgres 9.4 use the new aggregate FILTER clause:

Assuming actual date data type, so we can simply add / subtracts integer numbers for days.
Interpreting "within n days" as "+/- n days":

SELECT site_id, s.target_date
     , sum(e.num_events) FILTER (WHERE e.event_date BETWEEN s.target_date - 30
                                             AND s.target_date + 30) AS sum_30
     , sum(e.num_events) FILTER (WHERE e.event_date BETWEEN s.target_date - 60
                                             AND s.target_date + 60) AS sum_60
     , sum(e.num_events) FILTER (WHERE e.event_date BETWEEN s.target_date - 90
                                             AND s.target_date + 90) AS sum_90
FROM   site  s
JOIN   event e USING (site_id)
WHERE   e.event_date BETWEEN s.target_date - 90
                         AND s.target_date + 90
GROUP  BY 1, 2;

Also add the condition as WHERE clause to exclude irrelevant rows early. This should be substantially faster with more than a trivial number of rows outside the scope of sum_90 in event.