I need to make a histogram of events over a period of time. My dataset gives me the time of each event in the format ex. 2013-09-03 17:34:04, how do I convert this into something I'm able to plot in a histogram i Python? I know how to do it the other way around with the datetime and time commands.
By the way my dataset contains above 1.500.000 datapoint, so please only solutions that can be automated by loops or something like that ;)
Use
time.strptime()
to convert the local time string to atime.struct_time
and thentime.mktime()
, which will convert thetime.struct_time
to the number of seconds since 1970-01-01 00:00:00, UTC.I'm in timezone +10, and the output the above code gives me is:
Of course, for histogram-making purpose you may wish to subtract a convenient base time from these numbers.
Here's a better version, inspired by a comment by J. F. Sebastian.
output
Whenever I think about the problems that can arise from using localtime() I'm reminded of this classic example that happened to a friend of mine many years ago.
A programmer who was a regular contributor to the FidoNet C_ECHO had written process control code for a brewery. Unfortunately, his code used localtime() instead of gmtime(), which had unintended consequences when the brewery computer automatically adjusted its clock at the end of daylight saving. On that morning, localtime 2:00 AM happened twice. So his program repeated the process that it had already performed the first time 2:00 AM rolled around, which was to initiate the filling of a rather large vat with beer ingredients. As you can imagine, the brewery floor was a mess. :)
To handle time series with millions of points, you could try pandas: