I am a new user of "R", and I couldn't find a good solution to solve it. I got a timeseries in the following format:
>dates temperature depth salinity
>12/03/2012 11:26 9.7533 0.48073 37.607
>12/03/2012 11:56 9.6673 0.33281 37.662
>12/03/2012 12:26 9.6673 0.33281 37.672
I have an irregular frequency for variable measurements, done every 15 or every 30 minutes depending on the period. I would like to calculate annual, monthly and daily averages for each of my variables, whatever the number of data in a day/month/year is. I read a lot of things about the packages zoo, timeseries, xts, etc. but I can't get a clear vision of what I nead (maybe cause I'm not skilled enough with R...).
I hope my post is clear, don't hesitate to tell me if it's not.
Convert your data to an xts object, then use apply.daily
et al to calculate whatever values you want.
library(xts)
d <- structure(list(dates = c("12/03/2012 11:26", "12/03/2012 11:56",
"12/03/2012 12:26"), temperature = c(9.7533, 9.6673, 9.6673),
depth = c(0.48073, 0.33281, 0.33281), salinity = c(37.607,
37.662, 37.672)), .Names = c("dates", "temperature", "depth",
"salinity"), row.names = c(NA, -3L), class = "data.frame")
x <- xts(d[,-1], as.POSIXct(d[,1], format="%m/%d/%Y %H:%M"))
apply.daily(x, colMeans)
# temperature depth salinity
# 2012-12-03 12:26:00 9.695967 0.3821167 37.647
I'd add the day, month and year into the data frame and then use aggregate()
.
First convert your date
column into a POSIXct objet:
d$timestamp <- as.POSIXct(d$dates,format = "%m/%d/%Y %H:%M",tz ="GMT")
Then get the date (e.g. 12/03/2012) into a column called Date
, try this:
d$Date <- format(d$timestamp,"%y-%m-%d",tz = "GMT")
Next, aggregate by the date:
aggregate(cbind("temperature.mean" = temperature,
"salinity.mean" = salinity) ~ Date,
data = d,
FUN = mean)
Similarly, you can get the month into a column (let's call it M
for month), and then...
d$M <- format(d$timestamp,"%B",tz = "GMT")
aggregate(cbind("temperature.mean" = temperature,
"salinity.mean" = salinity) ~ M,
data = d,
FUN = mean)
or if you want year-month
d$YM <- format(d$timestamp,"%y-%B",tz = "GMT")
aggregate(cbind("temperature.mean" = temperature,
"salinity.mean" = salinity) ~ YM,
data = d,
FUN = mean)
If you have any NA values in your data, you may need to account for those:
aggregate(cbind("temperature.mean" = temperature,
"salinity.mean" = salinity) ~ YM,
data = d,
function(x) mean(x,na.rm = TRUE))
Finally, if you want to average by week, you can do that as well. First generate the week number, and then use aggregate()
again.
d$W <- format(d$timestamp,"%W",tz = "GMT")
aggregate(cbind("temperature.mean" = temperature,
"salinity.mean" = salinity) ~ W,
data = d,
function(x) mean(x,na.rm = TRUE))
This version of week number defines week 1 as being the week with the first Monday of the year. The weeks are from Monday to Sunday.
Yet, another method using plyr:
df <- structure(list(dates = c("12/03/2012 11:26", "12/03/2012 11:56",
"12/03/2012 12:26"), temperature = c(9.7533, 9.6673, 9.6673),
depth = c(0.48073, 0.33281, 0.33281), salinity = c(37.607,
37.662, 37.672)), .Names = c("dates", "temperature", "depth",
"salinity"), row.names = c(NA, -3L), class = "data.frame")
library(plyr)
# Change date to POSIXct
df$dates <- with(d,as.POSIXct(dates,format="%m/%d/%Y %H:%M"))
# Make new variables, year and month
df <- transform(d,month=as.numeric(format(dates,"%m")),year=as.numeric(format(dates,"%Y")))
## According to year
ddply(df,.(year),summarize,meantemp=mean(temperature),meandepth=mean(depth),meansalinity=mean(salinity))
year meantemp meandepth meansalinity
1 2012 9.695967 0.3821167 37.647
## According to month
ddply(df,.(month),summarize,meantemp=mean(temperature),meandepth=mean(depth),meansalinity=mean(salinity))
month meantemp meandepth meansalinity
1 12 9.695967 0.3821167 37.647
The package hydroTSM
holds a multiple functions to creat annual and other summaries:
daily2annual(x, ...)
subdaily2annual(x, ...)
monthly2annual(x, ...)
annualfunction(x, FUN, na.rm = TRUE, ...)