I have a logging table that contains data that looks like this:
ID MSG DATE
---------------------------------------------
1 TEst 2010-01-01 09:00:00
2 Job Start 2010-01-01 09:03:00
3 Do something 2010-01-01 09:03:10
4 Do something else 2010-01-01 09:03:12
5 Do something 2010-01-01 09:04:19
6 Job End 2010-01-01 09:06:30
7 Job Start 2010-01-01 09:18:03
8 Do something 2010-01-01 09:18:17
9 Do other thing 2010-01-01 09:19:48
10 Job End 2010-01-01 09:20:27
It contains (among other things) messags written by the application. I am interested in producing a report of everything that gets written between a "Job Start" record and a "Job End" record, for all such pairs of "Job Start" and "Job End". Ideally the report would look like this:
BATCH_NUM ID MSG DATE
---------------------------------------------------------
1 3 Do something 2010-01-01 09:03:10
1 4 Do something else 2010-01-01 09:03:12
1 5 Do something 2010-01-01 09:04:19
2 8 Do something 2010-01-01 09:18:17
2 9 Do other thing 2010-01-01 09:19:48
(line break between batches added to make it easier to read)
The output report omits the "Job STart and "Job End" messages, as well as the "TEst" message (which exists outside a pair of "Job Start" and "Job End".
I'm not even sure where to begin writing this kind of query, or if it's better to go with PL/SQL for this. Heck, I'm not even sure if there's a proper technical term for what I'm trying to do. ;)
(oracle version is 10g)
I'm sure there's a more analytic-ish way to do this, but I'm cheating with scalar subqueries to build the windows.
SQL> select * from logging_table;
ID MSG LOG_DT
---------- ------------------------------ -------------------
1 TEst 2010-01-01 09:00:00
2 Job Start 2010-01-01 09:03:00
3 Do something 2010-01-01 09:03:10
4 Do something else 2010-01-01 09:03:12
5 Do something 2010-01-01 09:04:19
6 Job End 2010-01-01 09:06:30
7 Job Start 2010-01-01 09:18:03
8 Do something 2010-01-01 09:18:17
9 Do other thing 2010-01-01 09:19:48
10 Job End 2010-01-01 09:20:27
SQL> l
1 select dense_rank() over (order by job_start_id) as batch,
-- ^-- this part gets the batch
2 job_step_id, msg, log_dt
3 -- nested select to filter out rows outside of the boundaries
4 from (select *
5 from (select id as job_step_id, msg, log_dt,
6 -- scalar subquery to get start of "window"
7 (select max(id)
8 from logging_table
9 where msg = 'Job Start'
10 and id < log.id) as job_start_id,
11 -- scalar subquery to get end of "window"
12 (select min(id)
13 from logging_table
14 where msg = 'Job End'
15 and id > log.id) as job_end_id
16 from logging_table log
17 -- filter out the "window" rows themselves
18 where msg not in ('Job Start', 'Job End')
19 )
20 -- the filtering out of "unbounded" records
21 where job_start_id is not null
22 and job_end_id is not null
23 )
24* order by job_step_id
SQL> /
BATCH JOB_STEP_ID MSG LOG_DT
1 3 Do something 2010-01-01 09:03:10
1 4 Do something else 2010-01-01 09:03:12
1 5 Do something 2010-01-01 09:04:19
2 8 Do something 2010-01-01 09:18:17
2 9 Do other thing 2010-01-01 09:19:48
The following sample uses much the same idea as Adam Musch's answer, but gets the logging sets' start and end in a single select which is joined to the logging statement, rather than using scalar sub-queries.
I don't think DENSE_RANK() can be used any more directly because there is no column available to partition on.
Also, either of these solutions assume that the will never be overlapping sets of logs. If a second set starts before the first one ends, it's a whole new problem...
WITH logging_sets AS
(SELECT DENSE_RANK () OVER (ORDER BY start_date) AS set_rank, start_date, end_date
FROM (SELECT CASE msg
WHEN 'Job End'
THEN NULL
ELSE LEAD (log_dt, 1, NULL) OVER (ORDER BY log_dt)
END AS end_date, log_dt AS start_date, msg
FROM logging_table lt
WHERE msg IN ('Job Start', 'Job End') )
WHERE msg = 'Job Start')
SELECT ls.set_rank, lt.ID, lt.msg, lt.log_dt
FROM logging_table lt, logging_sets ls
WHERE lt.log_dt > ls.start_date AND lt.log_dt < ls.end_date
AND msg NOT IN ('Job Start', 'Job End')
ORDER BY ls.set_rank, lt.log_dt;
Here's a way of doing the same trick without using window functions, just in case.
with logging_table as (
select 1 id, 'job start' msg from dual union
select 2, 'do somenthing in batch 1' from dual union
select 3, 'do somenthing else in batch 1' from dual union
select 4, 'job end' from dual union
select 5, 'job start' from dual union
select 6, 'do somenthing in batch 2' from dual union
select 7, 'do somenthing else in batch 2' from dual union
select 8, 'job end' from dual
),
jobs as (
select lt_start.id id_start,
(select min(id)
from logging_table lt_end
where lt_end.id > lt_start.id
and msg = 'job end') id_end,
rownum as batch_no
from logging_table lt_start
where msg = 'job start'
)
select *
from logging_table join jobs
on id > id_start and id < id_end
order by batch_no, id