I have a data frame with 2 variables. the first column "X" represents date and time with format dd/mm/yyyy hh:mm, the values in the second column "Y" are the electricity meter reading which are taken each after 5 minutes. Now I want to add the values of each half an hour. For instance
X Y
13/12/2014 12:00 1
13/12/2014 12:05 2
13/12/2014 12:10 1
13/12/2014 12:15 2
13/12/2014 12:20 2
13/12/2014 12:25 1
At the end i want to present a result as:
13/12/2014 12:00 9
13/12/2014 12:30 12
and so on...
Here's an alternative approach which actually takes X
in count (as per OP comment).
First, we will make sure X
is of proper POSIXct
format so we could manipulate it correctly (I'm using the data.table
package here for convenience)
library(data.table)
setDT(df)[, X := as.POSIXct(X, format = "%d/%m/%Y %R")]
Then, we will aggregate per cumulative minutes instances of 00
or 30
within X
while summing Y
and extracting the first value of X
per each group. I've made a more complicated data set in order illustrate more complicated scenarios (see below)
df[order(X), .(X = X[1L], Y = sum(Y)), by = cumsum(format(X, "%M") %in% c("00", "30"))]
# cumsum X Y
# 1: 0 2014-12-13 12:10:00 6
# 2: 1 2014-12-13 12:30:00 6
# 3: 2 2014-12-13 13:00:00 3
Data
df <- read.table(text = "X Y
'13/12/2014 12:10' 1
'13/12/2014 12:15' 2
'13/12/2014 12:20' 2
'13/12/2014 12:25' 1
'13/12/2014 12:30' 1
'13/12/2014 12:35' 1
'13/12/2014 12:40' 1
'13/12/2014 12:45' 1
'13/12/2014 12:50' 1
'13/12/2014 12:55' 1
'13/12/2014 13:00' 1
'13/12/2014 13:05' 1
'13/12/2014 13:10' 1", header = TRUE)
Some explanations
- The
by
expression:
format(X, "%M")
gets the minutes out of X
(see ?strptime
)
- Next step is check if they match
00
or 30
(using %in%
)
cumsum
separates these matched values into separate groups which we aggregate by by putting this expression into the by
statement (see ?data.table
)
- The
j
th epression
(X = X[1L], Y = sum(Y))
is simply getting the first value of X
per each group and the sum of Y
per each group.
- The
i
th expression
- I've added
order(X)
in order to make sure the data set is properly ordered by date (one of the main reasons I've converted X
to proper POSIXct
format)
For a better understanding on how data.table
works, see some tutorials here
t1 <- tapply(df$Y, as.numeric(as.POSIXct(df$X, format = '%d/%m/%Y %H:%M')) %/% 1800, sum)
data.frame(time = as.POSIXct(as.numeric(names(t1))*1800 + 1800, origin = '1970-01-01'), t1)
t1
groups the values using integer division by 1800 (30 minutes)
Considering your data frame as df
. You can try -
unname(tapply(df$Y, (seq_along(df$Y)-1) %/% 6, sum))