I have a table:
create table table1 (event_id integer, event_time timestamp without time zone);
insert into table1 (event_id, event_time) values
(1, '2011-01-01 00:00:00'),
(2, '2011-01-01 00:00:15'),
(3, '2011-01-01 00:00:29'),
(4, '2011-01-01 00:00:58'),
(5, '2011-01-02 06:03:00'),
(6, '2011-01-02 06:03:09'),
(7, '2011-01-05 11:01:31'),
(8, '2011-01-05 11:02:15'),
(9, '2011-01-06 09:34:19'),
(10, '2011-01-06 09:34:41'),
(11, '2011-01-06 09:35:06');
I would like to construct a statement that given an event could return the length of the 'run' of events starting with that event. A run is defined by:
- Two events are in a run together if they are within 30 seconds of one another.
- If A and B are in a run together, and B and C are in a run together then A is in a run with C.
However my query does not need to go backwards in time, so if I select on event 2, then only events 2, 3, and 4 should be counted as part of the run of events starting with 2, and 3 should be returned as the length of the run.
Any ideas? I'm stumped.
Could look like this:
This version does not rely on a gap-less sequence of IDs, but on
event_time
only.Identical
event_time
's are additionally sorted byevent_id
to be unambiguous.Read about the window functions
row_number()
andlead()
and CTE (With clause) in the manual.Edit
If we cannot assume that a bigger
event_id
has a later (or equal)event_time
, substitute this for the firstWHERE
clause:Rows with the same
event_time
as the starting row but a a smallerevent_id
will still be ignored.In the special case of one run till the end no end is found and no row returned.
COALESCE
returns the count of all rows instead.Here is the RECURSIVE CTE-solution. (islands-and-gaps problems naturally lend themselves to recursive CTE)
Result:
You can join a table onto itself on a date difference statement. Actually, this is postgres, a simple minus works.
This subquery will find all records that is a 'start event'. That is to say, all event records that does not have another event record occurring within 30 seconds before it:
With a few changes...same logic, except picking up an 'end' event:
Now we can join these together to associate which start event goes to which end event:
(still writing...there's a couple ways at going on this. I'm assuming only the example has linear ID numbers, so you'll want to join the start event time to the end event time having the smallest positive difference on the event times).
Here's my end result...kinda nested a lot of subselects
Results as start_id, end_Id:
1;"4"
5;"6"
7;"single event"
8;"single event"
9;"11"
I had to use a third left join to pick out single events as a method of detecting events that were both start events and end events. End result is in ID's and can be linked back to your original table if you want different information than just the ID. Unsure how this solution will scale, if you've got millions of events...could be an issue.