I have a requirement to retrieve a list of employees, and for each employee a list of months they were actively on benefits coverage in a given year. There is a table with job data, and a table with benefits information. There is also a delivered dates table that lists out every date from 2007-2018 and for each date it shows the day of month, month of year, and calendar year.
The way I have written the query now is to say: find all the dates on the dates table that are 1) between 01/01 and 12/31 of the prompt year(or the current date, whichever is older), 2) during the time the employee was active on the benefits table. For each date I also want the deptid from the jobs table and the benefit plan from the benefit table as of that date. Then I do a distinct, only showing the month of year, and calendar year for each employee.
This works, but the problem comes when I try to do it for departments with lots of people in them. It takes a very long time to run, I believe because it is retrieving up to 365 rows for every single employee and then only showing 12 of those, since it is only pulling distinct months. I feel like there is a better way to do this, I just can't think of what it is.
Here are some simplified examples of the tables I'm working with:
Dates Table
THE_DATE MONTHOFYEAR CALENDAR_YEAR
01-OCT-15 10 2015
02-OCT-15 10 2015
03-OCT-15 10 2015
...
Jobs Table
(A=Active; I=Inactive)
EMPLID EFFDT DEPTID HR_STATUS
00123 01-FEB-15 900 A
00123 30-JUN-15 900 I
00123 01-AUG-15 901 A
Benefits Table
EMPLID EFFDT BENEFIT_PLAN STATUS
00123 01-MAR-15 PPO A
00123 31-JUL-15 I
00123 01-SEP-15 HMO A
Desired Result
EMPLID CALENDAR_YEAR MONTHOFYEAR DEPTID BENEFIT_PLAN
00123 2015 3 900 PPO
00123 2015 4 900 PPO
00123 2015 5 900 PPO
00123 2015 6 900 PPO
00123 2015 7 900 PPO
00123 2015 9 901 HMO
00123 2015 10 901 HMO
00123 2015 11 901 HMO
^ (shows November row even though employee was only covered for part of this month)
Example SQL to Get Results Above
SELECT DISTINCT J.EMPLID, D.CALENDAR_YEAR, D.MONTHOFYEAR, J.DEPTID, B.BENEFIT_PLAN
FROM DATES D,
JOBS J
JOIN
BENEFITS B
ON J.EMPLID = B.EMPLID
WHERE D.THE_DATE <= SYSDATE
AND D.THE_DATE BETWEEN
TO_DATE(:YEAR_PROMPT || '01-01', 'YYYY-MM-DD')
AND
TO_DATE(:YEAR_PROMPT || '12-31', 'YYYY-MM-DD')
AND B.STATUS = 'A'
AND D.THE_DATE BETWEEN
B.EFFDT
AND
NVL(SELECT MIN(B_ED.EFFDT)
FROM BENEFITS B_ED
WHERE B_ED.EMPLID = B.EMPLID
AND B_ED.EFFDT > B.EFFDT
, SYSDATE)
AND J.EFFDT = (SELECT MAX(J_ED.EFFDT)
FROM JOBS J_ED
WHERE J_ED.EMPLID = J.EMPLID
AND J_ED.EFFDT <= D.THE_DATE)
Instead of saying "retrieve every single date and check to see if it fits the criteria", can I change up the logic somehow to get the same results without churning through so many rows?
Yes; by using the
LEAD()
analytic function, you can calculate the next effdt in the jobs and benefits tables, which makes it easier to query between the ranges.Something like:
(Obviously, you can exclude the
dates
,jobs
andbenefits
subqueries from the above query, since you already have those tables. They're only present in the query to simulate having tables with that data in it without needing to actually create the tables.).ETA: Here's a version that just calculates the 12 months based on the year that's passed in, which reduces the date rows to 12, rather than 365/366 rows.
Unfortunately, you'll still need the distinct, to take account of when you have multiple rows starting in the same month.
For example, with the data in the following example, you would end up with 3 rows for month 6 if you removed the distinct. However, the number of rows the distinct is operating over will be far less than previously.