从时间戳在Python列表中找到每天的时间间隔(Finding time intervals per

2019-11-03 15:32发布

i am trying to compute time intervals per day from a list of unix timestamps in Python. I have searched for simular questions on stack overflow but mostly found examples of computing deltas or SQL solutions.

I have a list of the sort:

timestamps = [1176239419.0, 1176334733.0, 1176445137.0, 1177619954.0, 1177620812.0, 1177621082.0, 1177838576.0, 1178349385.0, 1178401697.0, 1178437886.0, 1178926650.0, 1178982127.0, 1179130340.0, 1179263733.0, 1179264930.0, 1179574273.0, 1179671730.0, 1180549056.0, 1180763342.0, 1181386289.0, 1181990860.0, 1182979573.0, 1183326862.0]

I can easily turn this list of timestamps into datetime objects using:

[dt.datetime.fromtimestamp(int(i)) for i in timestamps]

From there I can probably write quite a lengthy code where the first day/month is kept and a check is done to see if the next item in the list is of the same day/month. If it is I look at the times, get the first and last from the day and store the interval + day/month in a dictionary.

As I am fairly new to Python I was wondering what is the best way to do this in this programming language without the abusive use of if/else statements.

Thank you in advance

Answer 1:

如果列表排序为你的情况,那么你可以使用itertools.groupby()到组的时间戳到天:

#!/usr/bin/env python
from datetime import date, timedelta
from itertools import groupby

epoch = date(1970, 1, 1)

result = {}
assert timestamps == sorted(timestamps)
for day, group in groupby(timestamps, key=lambda ts: ts // 86400):
    # store the interval + day/month in a dictionary.
    same_day = list(group)
    assert max(same_day) == same_day[-1] and min(same_day) == same_day[0]
    result[epoch + timedelta(day)] = same_day[0], same_day[-1] 
print(result)

产量

{datetime.date(2007, 4, 10): (1176239419.0, 1176239419.0),
 datetime.date(2007, 4, 11): (1176334733.0, 1176334733.0),
 datetime.date(2007, 4, 13): (1176445137.0, 1176445137.0),
 datetime.date(2007, 4, 26): (1177619954.0, 1177621082.0),
 datetime.date(2007, 4, 29): (1177838576.0, 1177838576.0),
 datetime.date(2007, 5, 5): (1178349385.0, 1178401697.0),
 datetime.date(2007, 5, 6): (1178437886.0, 1178437886.0),
 datetime.date(2007, 5, 11): (1178926650.0, 1178926650.0),
 datetime.date(2007, 5, 12): (1178982127.0, 1178982127.0),
 datetime.date(2007, 5, 14): (1179130340.0, 1179130340.0),
 datetime.date(2007, 5, 15): (1179263733.0, 1179264930.0),
 datetime.date(2007, 5, 19): (1179574273.0, 1179574273.0),
 datetime.date(2007, 5, 20): (1179671730.0, 1179671730.0),
 datetime.date(2007, 5, 30): (1180549056.0, 1180549056.0),
 datetime.date(2007, 6, 2): (1180763342.0, 1180763342.0),
 datetime.date(2007, 6, 9): (1181386289.0, 1181386289.0),
 datetime.date(2007, 6, 16): (1181990860.0, 1181990860.0),
 datetime.date(2007, 6, 27): (1182979573.0, 1182979573.0),
 datetime.date(2007, 7, 1): (1183326862.0, 1183326862.0)}

如果只有一个在那一天比它重复两次时间戳。

你会怎么做之后,以测试如果最后(例如)5个结果中的条目比以前的14间隔较大?

entries = sorted(result.items())
intervals = [(end - start) for _, (start, end) in entries]
print(max(intervals[-5:]) > max(intervals[-5-14:-5]))
# -> False


Answer 2:

您可以使用collections.defaultdict 。 这是当你试图建立一个收集无大小和成员inital估计惊人地方便。

from collections import defaultdict

# Initialize default dict by the type list
# Accessing a member that doesn't exist introduces that entry with the deafult value for that type
# Here, when accessing a non-existant member adds an empty list to the collection
intervalsByDate = defaultdict(list)

for t in timestamps:
    dt = dt.datetime.fromtimestamp(t)
    myDateKey = (dt.day, dt.month, dt.year)
    # If the key doesn't exist, a new empty list is added
    intervalsByDate[myDateKey].append(t)

由此看来, intervalsByDate现在是一个dict与价值观作为基于日历日期排序的列表时间戳。 对于每一个日期,你可以时间戳进行排序,并得到总的时间间隔。 迭代defaultdict等同于dict (该子类的dict或多个)。

output = {}
for date, timestamps in intervalsByDate.iteritems():
    sortedIntervals = sorted(timestamps)
    output[date] = sortedIntervals[-1] - sortedIntervals[0]

现在output是地图与以毫秒为单位的值间隔日期。 用它做什么,你会的!


编辑

这是正常的键不排序? 我从togheter混合不同月份的密钥。

是的,因为(哈希)地图和dicts 本质上是无序的

我怎么能够,例如,从一个月选择第一个24天,然后最后

如果我是对我的回答很坚决,我想也许看看这个,这是一个有序的默认字典。 。 然而,你可以修改的数据类型output到一些东西,是不是dict ,以满足您的需求。 例如,一个list ,并责令其基于日期。



Answer 3:

只是减去对方2个日期。 这将导致timedelta实例。 见datetime.timedelta: https://docs.python.org/2/library/datetime.html#timedelta-objects

from datetime import datetime
delta = datetime.today() - datetime(year=2015, month=01, day=01)
#Actual printed out values may change depending o when you execute this :-)
print delta.days, delta.seconds, delta.microseconds #prints 49 50817 381000 
print delta.total_seconds() #prints 4284417.381 which is 49*24*3600 + 50817 + 381000/1000000

具有行切片和zip结合这让您的解决方案。 一个例子的解决办法是:

timestamps = [1176239419.0, 1176334733.0, 1176445137.0, 1177619954.0, 1177620812.0, 1177621082.0, 1177838576.0, 1178349385.0, 1178401697.0, 1178437886.0, 1178926650.0, 1178982127.0, 1179130340.0, 1179263733.0, 1179264930.0, 1179574273.0, 1179671730.0, 1180549056.0, 1180763342.0, 1181386289.0, 1181990860.0, 1182979573.0, 1183326862.0]
timestamps_as_dates = [datetime.fromtimestamp(int(i)) for i in timestamps]
# Make couples of each timestamp with the next one
# timestamps_as_dates[:-1] -> all your timestamps but the last one
# timestamps_as_dates[1:]  -> all your timestamps but the first one
# zip them together so that first and second are one couple, then second and third, ...
intervals = zip(timestamps_as_dates[:-1],timestamps_as_dates[1:])
interval_timedeltas = [(interval[1]-interval[0]).total_seconds() for interval in intervals]
# result = [95314.0, 110404.0, 1174817.0, 858.0, 270.0, 217494.0, 510809.0, 52312.0, 36189.0, 488764.0, 55477.0, 148213.0, 133393.0, 1197.0, 309343.0, 97457.0, 877326.0, 214286.0, 622947.0, 604571.0, 988713.0, 347289.0]

这也适用于添加或减去一个日期一定时期:

from datetime import datetime, timedelta
tomorrow = datetime.today() + timedelta(days=1)

我没有增加或减去月或数年的简单的解决方案。



文章来源: Finding time intervals per day from a list of timestamps in Python