Python: Grouping into timeslots (minutes) for days

2019-04-08 01:14发布

I have a list of events that occur at mS accurate intervals, that spans a few days. I want to cluster all the events that occur in a 'per-n-minutes' slot (can be twenty events, can be no events). I have a datetime.datetime item for each event, so I can get datetime.datetime.minute without any trouble.

My list of events is sorted in time order, earliest first, latest last. The list is complete for the time period I am working on.

The idea being that I can change list:-

[[a],[b],[c],[d],[e],[f],[g],[h],[i]...]

where a, b, c, occur between mins 0 and 29, d,e,f,g occur between mins 30 and 59, nothing between 0 and 29 (next hour), h, i between 30 and 59 ...

into a new list:-

[[[a],[b],[c]],[[d],[e],[f],[g]],[],[[h],[i]]...]

I'm not sure how to build an iterator that loops through the two time slots until the time series list ends. Anything I can think of using xrange stops once it completes, so I wondered if there was a way of using `while' to do the slicing?

I also will be using a smaller timeslot, probably 5 mins, I used 30mins as a shorter example for demonstration.

(for context, I'm making a geo plotted time based view of the recent quakes in New Zealand. and want to show all the quakes that occurs in a small block of time in one step to speed up the replay)

6条回答
2楼-- · 2019-04-08 01:54
# create sample data
from datetime import datetime, timedelta
d = datetime.now()
data = [d + timedelta(minutes=i) for i in xrange(100)]

# prepare and group the data
from itertools import groupby

def get_key(d):
    # group by 30 minutes
    k = d + timedelta(minutes=-(d.minute % 30)) 
    return datetime(k.year, k.month, k.day, k.hour, k.minute, 0)

g = groupby(sorted(data), key=get_key)

# print data
for key, items in g:
    print key
    for item in items:
        print '-', item

This is a python translation of this answer, which works by rounding the datetime to the next boundary and use that for grouping.


If you really need the possible empty groups, you can just add them by using this or a similar method:

def add_missing_empty_frames(g):
    last_key = None
    for key, items in g:
        if last_key:
            while (key-last_key).seconds > 30*60:
                empty_key = last_key + timedelta(minutes=30)
                yield (empty_key, [])
                last_key = empty_key
        yield (key, items)
        last_key = key

for key, items in add_missing_empty_frames(g):
    ...
查看更多
姐就是有狂的资本
3楼-- · 2019-04-08 01:59

If you have the whole list, you can just loop over it and stick each event in the right timeslot directly:

grouped = [[] for _ in xrange(whatever)]
for event in events:
    grouped[timeslot_of(event)].append(event)

If you need to turn an iterable of events into a grouped iterable, things get a bit messier. itertools.groupby almost works, but it skips time intervals with no events in them.

查看更多
混吃等死
4楼-- · 2019-04-08 02:07

Consider the following

def time_in_range(t,t_min,delta_t):
    if t<=t_min+delta_t and t>=t_min:
         return True
    else:
         return False
def group_list(input_list,ref_time,time_dx,result=[]):
    result.append([])
    for i,item in enumerate(input_list):
        if time_in_range(item,ref_time,time_dx):
            result[-1].append(item)
        else:
            return group_list(input_list[i:],ref_time+time_dx,time_dx,result=result)
def test():
    input_list = [1,2,3,4,5,8,10,20,30]
    print group_list(input_list,0,5)
test()
# Ouput:
# [[1, 2, 3, 4, 5], [8, 10], [], [20], [], [30]]

where you will need to write your own time_in_range function.

查看更多
贪生不怕死
5楼-- · 2019-04-08 02:09

I wondered if there was a way of using `while' to do the slicing?

I have this definition which might help you. It has no library dependencies and uses a while loop as requested:

If you have 2 lists; unix timestamps and values, each the same length where:

timestamps[0] is the time stamp for values[0] respectively.

timestamps = [unix, unix, unix, ....etc.]
values = [0.1, 0.2, 0.5, 1.1, ....etc.]

lets say you have 30 days of data, starting Nov 2011, and you want it grouped hourly:

BEGIN = 1320105600

hourly_values = []
z = 0
while z < 720:   # 24 hours * 30 days = 720
    hourly_values.append([])  # append an new empty list for each hour
    for i in range(len(timestamps)):
        if timestamps[i] >= (BEGIN + 3600*z):  # 3600 sec = 1 hour
            if timestamps[i] < (BEGIN + 3600*(z+1)):
                hourly_values[z].append(values[i])
    z+=1
return hourly_values

This will return a list of lists for each hour, with empty lists in the hours with no data.

查看更多
Rolldiameter
6楼-- · 2019-04-08 02:17

Assuming that the events are available in a chronologically ordered list called events, having a datetime attribute called timestamp:

interval = 10    # min
period = 2*24*60 # two days in minutes
timeslots = [[] for slot in range(period/interval)]
for e in events:
    index = int((e.timestamp-events[0].timestamp).total_seconds()/60) / interval
    timeslots[index].append(e)

This uses the first event as t=0 on the timeline. If that's not what you want, just substitute events[0].timestamp with a reference to a datetime instance that represents your t=0.

查看更多
甜甜的少女心
7楼-- · 2019-04-08 02:19

You could use the slotter module. I had a similar problem and I ended up writing a generic solution - https://github.com/saurabh-hirani/slotter

An asciinema demo - https://asciinema.org/a/8mm8f0qqurk4rqt90drkpvp1b?autoplay=1

查看更多
登录 后发表回答