Group by end of period instead of start date

2019-04-15 00:33发布

问题:

I'm looking to aggregate data by the end date of a dataset with some leading period rather than the start. For example, I want to query a table and return the count of matching results 30 days PRIOR to the end date of the date shown in the results. The original table would contain ONLY the date a sale was made (timestamp). Example:

sales_timestamp
------------------
2015-08-05 12:00:00
2015-08-06 13:00:00
2015-08-25 12:31:00
2015-08-26 01:02:00
2015-08-27 02:03:00
2015-08-29 04:23:00
2015-09-01 12:00:00
2015-09-02 12:00:00
2015-09-08 00:00:00

An example of the resulting query output would be:

date_period  |   count_of_sales
--------------------------------
2015-08-24   |        2
2015-08-31   |        6
2015-09-07   |        6

in which the date_period of 2015-09-07 would imply the company sold 6 items in the 30 days ENDING on 9/7/2015 (and starting ~8/7/2015 if a true 30 day period).

I've been toying with variations of the date_trunc() function but can't seem to get the truncation to apply on the end date rather than grouping by the start.

This data would be housed on PostgreSQL 9.1.

回答1:

This query does all you ask for:

SELECT day::date AS date_period, count_of_sales
FROM (
   SELECT *, sum(ct) OVER (ORDER BY day ROWS 30 PRECEDING) AS count_of_sales
   FROM   generate_series(date '2015-08-24' - 30  -- start 30 days earlier
                        , date '2015-09-07'
                        , interval '1 day') day
   LEFT JOIN (
      SELECT date_trunc('day', sales_timestamp) AS day, count(*)::int AS ct
      FROM   sales
      GROUP  BY 1
      ) s USING (day)
   ) sub
JOIN  generate_series(date '2015-08-24'
                    , date '2015-09-07 '
                    , interval '1 week') day USING (day);

SQL Fiddle.

Explanation

  1. Generate a full set of relevant days (1st generate_series())
  2. LEFTJOIN to the aggregated counts per day. The LEFT guarantees one row per day, which allows us to use window functions based on the row count.
  3. Use sum() as window aggregate function with a custom frame of 30 days preceding. (You may want to use 29 instead, it's unclear how you count.)

  4. Join the result to actual days you want in the result. (2nd generate_series() with one day per week).

Be aware that the definition of "day" is derived from the current time zone setting of your session if you work with timestamptz. Results can be different in different time zones. Does not apply for just timestamp, which does not depend on the current time zone. Basics:

  • Ignoring timezones altogether in Rails and PostgreSQL

Related answer with explanation for the window function with custom frame definition:

  • Select finishes where athlete didn't finish first for the past 3 events