I have a Date
, and am interested in representing it as an integer of yyyymm
form. Currently, I do:
get_year_month <- function(d) { return(as.integer(format(d, "%Y%m")))}
mydate = seq.Date(from=as.Date("2012-01-01"), to=as.Date("5012-01-01"), by=1)
system.time(ym <- get_year_month(mydate))
# user system elapsed
# 5.972 0.974 6.951
This is very slow for large datasets. Is there a faster way? Please provide timings for your answers so they can be easily compared. Use the above example.
Using functions from the lubridate
package can be almost twice as fast as your function :
mydate = as.Date(rep("2012-01-01",1000))
library(lubridate)
library(microbenchmark)
microbenchmark(get_year_month(mydate),
year(mydate)*100+month(mydate))
gives :
R> Unit: milliseconds
expr min lq median uq
get_year_month(mydate) 2.150296 2.188370 2.218176 2.285973
year(mydate) * 100 + month(mydate) 1.220016 1.228129 1.239704 1.284568
It would be best to keep your Dates in POSIXlt
format if you want to manipulate them like that:
> system.time(ym <- get_year_month(mydate))
user system elapsed
4.039 0.025 4.079
> system.time(mydatep <- as.POSIXlt(mydate))
user system elapsed
3.576 0.016 3.603
> system.time(ym <- (1900 + mydatep$year)*100 + (mydatep$mon + 1))
user system elapsed
0.010 0.005 0.015
It's still a little faster, and you get subsequent similar operations for free, in terms of time.
You can try using yearmon
class from zoo
package. In general if you are doing timeseries manipulation and analysis, I would suggest using xts
or atleast zoo
class. xts
has lot of functionality for analysis of very huge timeseries data.
Here is quick benchmark against other suggested solutions.
get_year_month <- function(d) {
return(as.integer(format(d, "%Y%m")))
}
mydate = as.Date(rep("2012-01-01", 1e+06))
microbenchmark(get_year_month(mydate), year(mydate) * 100 + month(mydate), as.yearmon(mydate, format = "%Y-%m-%d"), times = 1)
## Unit: milliseconds
## expr min lq median uq max neval
## get_year_month(mydate) 1049.8813 1049.8813 1049.8813 1049.8813 1049.8813 1
## year(mydate) * 100 + month(mydate) 434.1765 434.1765 434.1765 434.1765 434.1765 1
## as.yearmon(mydate, format = "%Y-%m-%d") 249.6704 249.6704 249.6704 249.6704 249.6704 1
There may not be a faster way for a single item. However you can make a version of the function that operates on collections run much faster than linearly by using builtin replicate e.g.
function mydate(D) {
x <- replicate(dim(D)[0], get_year_month(..)
return(x)
}