Hi I am looking to subset some minutely data by time. I normally use xts
doing something like:
subset.string <- 'T10:00/T13:00'
xts.min.obj[subset.string]
to get all the rows which are between 10am and 1pm (inclusive) EACH DAY and have the output as an xts format. But is a bit slow for my purposes...e.g
j <- xts(rnorm(10e6),Sys.time()-(10e6:1))
system.time(j['T10:00/T16:00'])
user system elapsed
5.704 0.577 17.115
I know that data.table
is v fast and at subsetting large datasets so am wondering if in conjunction with the fasttime
package to deal with fast POSIXct creations, if it would be worth it to create a function like
dt.time.subset <- function(xts.min.obj, subset.string){
require(data.table)
require(fasttime)
x.dt <- data.table(ts=format(index(xts.min.obj),"%Y-%m-%d %H:%M:%S %Z"),
coredata(xts.min.obj))
out <- x.dt[,some.subsetting.operation.using."%between%"]
xts(out,fastPOSIXct(out[,ts])
}
to convert the xts.min.obj into a data.table add some sort of character index and then use data.table to subset the relevant rows use the output row index with fasttime to recreate an xts output? or is this too many excess operations for something that is already highly optimised and written in C?
If you're ok with specifying your range in
UTC
, you can do: