I have table event
:
event_date,
num_events,
site_id
I can easily use aggregate SQL to do SELECT SUM(num_events) GROUP BY site_id
.
But I also have another table site
:
site_id,
target_date
I'd like to do a JOIN, showing the SUM of num_events
within 60 days of the target_date
, 90 days, 120 days, etc. I thought this could easily be done using a WHERE
clause in the aggregate SQL. However, this is complicated by two challenges:
- The
target_date
is not fixed, but varies for each site_id
- I'd like multiple date ranges to be output in the same table; so I can't do a simple
WHERE
to exclude records falling outside the range from the event
table.
One workaround I've thought of is to simply make several queries, one for each date range, and then use a view to paste them together. Is there a simpler, better, or more elegant way to achieve my goals?
You would do something like:
select sum(case when target_date - event_date < 30 then 1 else 0 end) as within_030,
sum(case when target_date - event_date < 60 then 1 else 0 end) as within_060,
sum(case when target_date - event_date < 90 then 1 else 0 end) as within_090
from event e join
site s
on e.site_id = s.site_id;
That is, you can use conditional aggregation. I am not sure what "within 60" days means. This gives days before the target date, but similar logic will work for what you need.
In Postgres 9.4 use the new aggregate FILTER
clause:
Assuming actual date
data type, so we can simply add / subtracts integer
numbers for days.
Interpreting "within n days" as "+/- n days":
SELECT site_id, s.target_date
, sum(e.num_events) FILTER (WHERE e.event_date BETWEEN s.target_date - 30
AND s.target_date + 30) AS sum_30
, sum(e.num_events) FILTER (WHERE e.event_date BETWEEN s.target_date - 60
AND s.target_date + 60) AS sum_60
, sum(e.num_events) FILTER (WHERE e.event_date BETWEEN s.target_date - 90
AND s.target_date + 90) AS sum_90
FROM site s
JOIN event e USING (site_id)
WHERE e.event_date BETWEEN s.target_date - 90
AND s.target_date + 90
GROUP BY 1, 2;
Also add the condition as WHERE
clause to exclude irrelevant rows early. This should be substantially faster with more than a trivial number of rows outside the scope of sum_90
in event
.