I have a table in a Postgres DB like this:
person | eventdate | type
--------------------------------------
<uuid-1> | 2016-05-14 | 300
<uuid-3> | 2016-05-14 | 300
<uuid-1> | 2016-05-15 | 301
<uuid-1> | 2016-05-16 | 301
<uuid-1> | 2016-05-18 | 304
<uuid-1> | 2016-05-22 | 300
<uuid-2> | 2016-05-22 | 304
<uuid-2> | 2016-05-27 | 301
<uuid-1> | 2016-05-30 | 300
<uuid-1> | 2016-06-01 | 300
<uuid-2> | 2016-06-15 | 501
<uuid-2> | 2016-06-16 | 301
<uuid-4> | 2016-06-16 | 300
<uuid-5> | 2016-06-20 | 300
<uuid-1> | 2016-06-21 | 300
<uuid-2> | 2016-06-21 | 300
<uuid-2> | 2016-06-23 | 301
<uuid-2> | 2016-06-30 | 300
<uuid-3> | 2016-06-30 | 300
<uuid-4> | 2016-06-30 | 300
The table contains non consecutive day entries for each day an employee is absence for different reasons (types) of absence. However an absence period could span several of these days and any absence entry that is within 5 days of a previous absence of the same type is still considered part of the same absence 'period'.
I need to get output for each employees absences periods with start and end date of those periods, plus the total number of days within that multi-date-spanned period.
This is further complicated by the fact that different types of absence are considered the same for the purpose of this report. So in the example above, type 300, 301, 304 would be treated as the same.
So from my example above the following would be what I am after ...
person | startdate | enddate | days | type
--------------------------------------------------------------------
<uuid-1> | 2016-05-14 | 2016-05-22 | 5 | 300
<uuid-3> | 2016-05-14 | 2016-04-14 | 1 | 300
<uuid-2> | 2016-05-22 | 2016-04-27 | 2 | 304
<uuid-1> | 2016-05-30 | 2016-06-01 | 2 | 300
<uuid-2> | 2016-06-15 | 2016-06-15 | 1 | 501
<uuid-2> | 2016-06-16 | 2016-06-16 | 1 | 301
<uuid-4> | 2016-06-16 | 2016-06-16 | 1 | 300
<uuid-5> | 2016-06-20 | 2016-06-20 | 1 | 300
<uuid-1> | 2016-06-21 | 2016-06-21 | 1 | 300
<uuid-2> | 2016-06-21 | 2016-06-23 | 2 | 300
<uuid-2> | 2016-06-30 | 2016-06-30 | 1 | 300
<uuid-3> | 2016-06-30 | 2016-06-30 | 1 | 300
<uuid-4> | 2016-06-30 | 2016-06-30 | 1 | 300
How do I query this table to this output?
It's unclear how you determine the
type
for each period. I chose the minimum number.Assuming this basic table definition:
Basically, I suggest window functions in two nested subqueries to identify members of the same period (island). Then aggregate:
Result (based on your example data):
If the same day for the same person can be entered multiple times with different types, and you only want to count distinct days, make it:
count(DISTINCT eventdate) AS days
.Related, with detailed explanation:
BTW,
eventdate - 6
works for data typedate
, but not fortimestamp
: