-->

Coalescing date ranges in postgres to eliminate ov

2019-09-07 04:20发布

问题:

If I have a postgres table a:

  member |    start   |    end
---------+------------+------------
    1    | 2015-01-01 | 2015-05-01
---------+------------+------------
    1    | 2015-03-01 | 2015-06-01
---------+------------+------------
    2    | 2015-01-01 | 2015-05-01
---------+------------+------------
    2    | 2015-06-01 | 2015-08-01

How would I coalesce dates to eliminate overlapping ranges like this:

  member |    start   |    end
---------+------------+------------
    1    | 2015-01-01 | 2015-06-01
---------+------------+------------
    2    | 2015-01-01 | 2015-05-01
---------+------------+------------
    2    | 2015-06-01 | 2015-08-01

回答1:

In the chop CTE original ranges are "chopped" into smaller, non-intersecting (but possibly adjacent) ranges. They are constructed from all the end points of the original ranges, both start and finish.

Main select works as follows (read it from the inside out):

  1. Adjacent flag for a range is zero when it adjoins the previous range (assuming that ranges are ordered by their start dates).
  2. A cumulative sum of adjacent flags gives us a grouping value: all the adjacent ranges will have the same sum.
  3. Outermost block simply calculates the bounding values for the adjacent groups of ranges.

Black magic of window functions...

with chop as (
  select member,
         pt as start,
         lead(pt) over (partition by member order by pt) finish,
         (
           select count(*)
           from   a
           where  b.member = a.member
           and    b.pt >= a.start
           and    b.pt < a.finish
         ) need_it
  from   (
           select member, start pt from a
           union
           select member, finish pt from a
         ) b
)
-- 3
select member,
       min(start),
       max(finish)
from   (
         -- 2
         select member,
                start,
                finish,
                sum(adjacent) over (partition by member order by start) grp
         from   (
                  -- 1
                  select member,
                         start,
                         finish,
                         case
                           when start <= lag(finish) over (partition by member order by start)
                           then 0
                           else 1
                         end adjacent
                  from   chop
                  where  need_it > 0
                ) t
       ) q
group by member,
         grp
order by member,
         min(start);

I renamed end to finish because end is a keyword.