I have a list of events that occur at mS accurate intervals, that spans a few days. I want to cluster all the events that occur in a 'per-n-minutes' slot (can be twenty events, can be no events). I have a datetime.datetime
item for each event, so I can get datetime.datetime.minute
without any trouble.
My list of events is sorted in time order, earliest first, latest last. The list is complete for the time period I am working on.
The idea being that I can change list:-
[[a],[b],[c],[d],[e],[f],[g],[h],[i]...]
where a, b, c, occur between mins 0 and 29, d,e,f,g occur between mins 30 and 59, nothing between 0 and 29 (next hour), h, i between 30 and 59 ...
into a new list:-
[[[a],[b],[c]],[[d],[e],[f],[g]],[],[[h],[i]]...]
I'm not sure how to build an iterator that loops through the two time slots until the time series list ends. Anything I can think of using xrange
stops once it completes, so I wondered if there was a way of using `while' to do the slicing?
I also will be using a smaller timeslot, probably 5 mins, I used 30mins as a shorter example for demonstration.
(for context, I'm making a geo plotted time based view of the recent quakes in New Zealand. and want to show all the quakes that occurs in a small block of time in one step to speed up the replay)
This is a python translation of this answer, which works by rounding the datetime to the next boundary and use that for grouping.
If you really need the possible empty groups, you can just add them by using this or a similar method:
If you have the whole list, you can just loop over it and stick each event in the right timeslot directly:
If you need to turn an iterable of events into a grouped iterable, things get a bit messier.
itertools.groupby
almost works, but it skips time intervals with no events in them.Consider the following
where you will need to write your own
time_in_range
function.I have this definition which might help you. It has no library dependencies and uses a while loop as requested:
If you have 2 lists; unix timestamps and values, each the same length where:
timestamps[0] is the time stamp for values[0] respectively.
lets say you have 30 days of data, starting Nov 2011, and you want it grouped hourly:
This will return a list of lists for each hour, with empty lists in the hours with no data.
Assuming that the events are available in a chronologically ordered list called
events
, having adatetime
attribute calledtimestamp
:This uses the first event as t=0 on the timeline. If that's not what you want, just substitute
events[0].timestamp
with a reference to adatetime
instance that represents your t=0.You could use the slotter module. I had a similar problem and I ended up writing a generic solution - https://github.com/saurabh-hirani/slotter
An asciinema demo - https://asciinema.org/a/8mm8f0qqurk4rqt90drkpvp1b?autoplay=1