Group by end of period instead of start date

I'm looking to aggregate data by the end date of a dataset with some leading period rather than the start. For example, I want to query a table and return the count of matching results 30 days PRIOR to the end date of the date shown in the results. The original table would contain ONLY the date a sale was made (timestamp). Example:

sales_timestamp
------------------
2015-08-05 12:00:00
2015-08-06 13:00:00
2015-08-25 12:31:00
2015-08-26 01:02:00
2015-08-27 02:03:00
2015-08-29 04:23:00
2015-09-01 12:00:00
2015-09-02 12:00:00
2015-09-08 00:00:00

An example of the resulting query output would be:

date_period  |   count_of_sales
--------------------------------
2015-08-24   |        2
2015-08-31   |        6
2015-09-07   |        6

in which the date_period of 2015-09-07 would imply the company sold 6 items in the 30 days ENDING on 9/7/2015 (and starting ~8/7/2015 if a true 30 day period).

I've been toying with variations of the date_trunc() function but can't seem to get the truncation to apply on the end date rather than grouping by the start.

This data would be housed on PostgreSQL 9.1.

标签： sql postgresql time-series postgresql-9.1 window-functions

1条回答

我欲成王，谁敢阻挡

2楼-- · 2019-04-15 01:27

This query does all you ask for:

SELECT day::date AS date_period, count_of_sales
FROM (
   SELECT *, sum(ct) OVER (ORDER BY day ROWS 30 PRECEDING) AS count_of_sales
   FROM   generate_series(date '2015-08-24' - 30  -- start 30 days earlier
                        , date '2015-09-07'
                        , interval '1 day') day
   LEFT JOIN (
      SELECT date_trunc('day', sales_timestamp) AS day, count(*)::int AS ct
      FROM   sales
      GROUP  BY 1
      ) s USING (day)
   ) sub
JOIN  generate_series(date '2015-08-24'
                    , date '2015-09-07 '
                    , interval '1 week') day USING (day);

SQL Fiddle.

Explanation

Generate a full set of relevant days (1st generate_series())
LEFTJOIN to the aggregated counts per day. The LEFT guarantees one row per day, which allows us to use window functions based on the row count.
Use sum() as window aggregate function with a custom frame of 30 days preceding. (You may want to use 29 instead, it's unclear how you count.)
Join the result to actual days you want in the result. (2nd generate_series() with one day per week).

Be aware that the definition of "day" is derived from the current time zone setting of your session if you work with timestamptz. Results can be different in different time zones. Does not apply for just timestamp, which does not depend on the current time zone. Basics:

Ignoring timezones altogether in Rails and PostgreSQL

Related answer with explanation for the window function with custom frame definition:

Select finishes where athlete didn't finish first for the past 3 events

0人赞添加讨论(0) 举报

Group by end of period instead of start date

Explanation

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间