I have a Date
, and am interested in representing it as an integer of yyyymm
form. Currently, I do:
get_year_month <- function(d) { return(as.integer(format(d, "%Y%m")))}
mydate = seq.Date(from=as.Date("2012-01-01"), to=as.Date("5012-01-01"), by=1)
system.time(ym <- get_year_month(mydate))
# user system elapsed
# 5.972 0.974 6.951
This is very slow for large datasets. Is there a faster way? Please provide timings for your answers so they can be easily compared. Use the above example.
Using functions from the
lubridate
package can be almost twice as fast as your function :gives :
It would be best to keep your Dates in
POSIXlt
format if you want to manipulate them like that:It's still a little faster, and you get subsequent similar operations for free, in terms of time.
You can try using
yearmon
class fromzoo
package. In general if you are doing timeseries manipulation and analysis, I would suggest usingxts
or atleastzoo
class.xts
has lot of functionality for analysis of very huge timeseries data.Here is quick benchmark against other suggested solutions.
There may not be a faster way for a single item. However you can make a version of the function that operates on collections run much faster than linearly by using builtin replicate e.g.