I've got a table which contains start-times (using number in example to keep it simple), and durations of events.
I would like to identify "blocks" and their start- and the end-time.
Whenever the difference between the end-time (start-time + duration) of the previous row (sorted by start-time) and the start-time of the current row is >=5
, a new "block" should begin.
This is my test-data, including an attempt of a graphical explanation in the comments:
WITH test_data AS (
SELECT 0 s, 2 dur FROM dual UNION ALL --# ■■
SELECT 2 , 2 FROM dual UNION ALL --# ■■
SELECT 10 , 1 FROM dual UNION ALL --# ■
SELECT 13 , 4 FROM dual UNION ALL --# ■■■■
SELECT 15 , 4 FROM dual --# ■■■■
)
--# Should return
--# 0 .. 4 --# ■■■■
--# 10 .. 19 --# ■■■■■■■■■
The first block starts at 0
and ends at 4
. Since the difference to the next row is >=5
, start another block at 10
which ends at 19
.
I can identify the first row of a block, using LAG
, but I have not yet found out how to proceed.
And I could solve the problem in a PL/SQL-loop, but I'm trying to avoid that for performance reasons.
Any suggestions on how to write this query?
Thanks in advance, Peter
There is a fantastic book by Richard Snodgrass which may help: Developing Time-Oriented Database Applications in SQL (free to download) which I have found invaluable when dealing with time in databases.
Have a look on Richards page for links to some book corrections and the associated CD-ROM in zip format.
The code gets a bit complicated with a number of subqueries, etc. The may be instances of data where this doesn't work but I can't think of any off the top of my head.
Working with temporal data is always a pain!
In MS-SQL I would use
ROW_NUMBER() OVER(ORDER BY starttime) AS Rank
to rank the rows on start time.Then, I would write a query to join each line to the line with previous Rank and set a flag if the difference is bigger than five or NULL (first row).
Then, I would select all rows having this flag which are start rows, and for this subset repeat the process of numbering rows and joining to the next row to get the time spans:
Finally, this dataset can be joined to the original data with a
WHERE starttime BETWEEN blockstarttime and nextstarttime
to partition the results.Up to you to translate this to Oracle...
I use subqueries with analytics to identify and group contiguous ranges: