Binning time series in R?

2020-07-27 05:38发布

问题:

I'm new to R. My data has 600k objects defined by three attributes: Id, Date and TimeOfCall.

TimeofCall has a 00:00:00 format and range from 00:00:00 to 23:59:59.

I want to bin the TimeOfCall attribute, into 24 bins, each one representing hourly slot (first bin 00:00:00 to 00:59:59 and so on).

Can someone talk me through how to do this? I tried using cut() but apparently my format is not numeric. Thanks in advance!

回答1:

While you could convert to a formal time representation, in this case it might be easier to just use substr:

test <- c("00:00:01","02:07:01","22:30:15")
as.numeric(substr(test,1,2))
#[1]  0  2 22

Using a POSIXct time to deal with it would also work, and might be handy if you plan on further calculations (differences in time etc):

testtime <- as.POSIXct(test,format="%H:%M:%S")
#[1]"2013-12-09 00:00:01 EST" "2013-12-09 02:07:01 EST" "2013-12-09 22:30:15 EST"
as.numeric(format(testtime,"%H"))
#[1]  0  2 22


回答2:

You can use cut.POsixlt function. But you should coerce your data to a valid time object. here I am using handy hms from lubridate. And strftime to get the time format.

library(lubridate)
x <- c("09:10:01", "08:10:02",  "08:20:02","06:10:03 ", "Collided at 9:20:04 pm")
x.h <- strftime(cut(as.POSIXct(hms(x),origin=Sys.Date()),'hours'),
         format='%H:%M:%S')

data.frame(x,x.h)

                       x      x.h
1               09:10:01 10:00:00
2               08:10:02 09:00:00
3               08:20:02 09:00:00
4              06:10:03  07:00:00
5 Collided at 9:20:04 pm 22:00:00