From date-time to usable value Python

2019-08-08 22:17发布

I need to make a histogram of events over a period of time. My dataset gives me the time of each event in the format ex. 2013-09-03 17:34:04, how do I convert this into something I'm able to plot in a histogram i Python? I know how to do it the other way around with the datetime and time commands.

By the way my dataset contains above 1.500.000 datapoint, so please only solutions that can be automated by loops or something like that ;)

2条回答
▲ chillily
2楼-- · 2019-08-08 22:59

Use time.strptime() to convert the local time string to a time.struct_time and then time.mktime(), which will convert the time.struct_time to the number of seconds since 1970-01-01 00:00:00, UTC.

#! /usr/bin/env python

import time

def timestr_to_secs(timestr):
    fmt = '%Y-%m-%d %H:%M:%S'
    time_struct = time.strptime(timestr, fmt)
    secs = time.mktime(time_struct)
    return int(secs)

timestrs = [
    '2013-09-03 17:34:04',
    '2013-09-03 17:34:05',
    '2013-09-03 17:35:04',
    '1970-01-01 00:00:00'
]

for ts in timestrs:
    print ts,timestr_to_secs(ts)

I'm in timezone +10, and the output the above code gives me is:

2013-09-03 17:34:04 1378193644
2013-09-03 17:34:05 1378193645
2013-09-03 17:35:04 1378193704
1970-01-01 00:00:00 -36000

Of course, for histogram-making purpose you may wish to subtract a convenient base time from these numbers.


Here's a better version, inspired by a comment by J. F. Sebastian.

#! /usr/bin/env python

import time
import calendar

def timestr_to_secs(timestr):
    fmt = '%Y-%m-%d %H:%M:%S'
    time_struct = time.strptime(timestr, fmt)
    secs = calendar.timegm(time_struct)
    return secs

timestrs = [
    '2013-09-03 17:34:04',
    '2013-09-03 17:34:05',
    '2013-09-03 17:35:04',
    '1970-01-01 00:00:00'
]

for ts in timestrs:
    print ts,timestr_to_secs(ts)

output

2013-09-03 17:34:04 1378229644
2013-09-03 17:34:05 1378229645
2013-09-03 17:35:04 1378229704
1970-01-01 00:00:00 0

Whenever I think about the problems that can arise from using localtime() I'm reminded of this classic example that happened to a friend of mine many years ago.

A programmer who was a regular contributor to the FidoNet C_ECHO had written process control code for a brewery. Unfortunately, his code used localtime() instead of gmtime(), which had unintended consequences when the brewery computer automatically adjusted its clock at the end of daylight saving. On that morning, localtime 2:00 AM happened twice. So his program repeated the process that it had already performed the first time 2:00 AM rolled around, which was to initiate the filling of a rather large vat with beer ingredients. As you can imagine, the brewery floor was a mess. :)

查看更多
孤傲高冷的网名
3楼-- · 2019-08-08 23:04

To handle time series with millions of points, you could try pandas:

#!/usr/bin/env python
from io import StringIO
import matplotlib.pyplot as plt # $ pip install matplotlib
import pandas as pd 

csv_file = StringIO(u"""time,A,B
2013-09-03 17:34:04,1,2
2013-09-03 17:34:05,3,4
2013-09-03 17:34:10,4,5
""")
df = pd.read_csv(csv_file, parse_dates=True, index_col='time')
df = df.cumsum()
df.plot()
plt.show()
查看更多
登录 后发表回答