I am new to SQL and need to be able to solve the following problem in both Hive and Postgres.
Data
I have a some data showing the start day and end day for different pre-prioritised tasks per person:
person task_key start_day end_day
1 Kate A 1 5
2 Kate B 1 5
3 Adam A 1 5
4 Adam B 2 5
5 Eve A 2 5
6 Eve B 1 5
7 Jason A 1 5
8 Jason B 4 5
9 Jason C 3 5
10 Jason D 5 5
11 Jason E 4 5
NOTE: Task key is ordered so that higher letters have higher priorities.
Question
I need to work out which task each person should be working on each day, with the condition that:
- Higher lettered tasks take priority over lower lettered tasks.
- If a higher lettered task overlaps any part of a lower lettered task, then the lower lettered task gets set to NA (to represent that the person should not work on it ever).
Simplification In the real data the end_day is always 5 in the original table i.e. only the start_day varies but the end_day is constant. This means my desired output will have the same number of rows as my original table :)
Output
This is the sort of output I need (Jason is more representative of the data I have which can be over 100 tasks covering a period of 90 days):
person task_key start_day end_day valid_from valid_to
1 Kate A 1 5 NA NA
2 Kate B 1 5 1 5
3 Adam A 1 5 1 2
4 Adam B 2 5 2 5
5 Eve A 2 5 NA NA
6 Eve B 1 5 1 5
7 Jason A 1 5 1 3
8 Jason B 4 5 NA NA
9 Jason C 3 5 3 4
10 Jason D 5 5 NA NA
11 Jason E 4 5 4 5
Thank you for your time in advance.
P.S. Similar question I have asked but in R: How to use a window function to determine when to perform different tasks?