Creating the mean average of every nth object in a

2020-05-01 09:59发布

I am trying to average every n-th object of a specific column in a dataframe using the following code. I understand that using the for-loop is computationally inefficient. This is why I would like to ask whether there is a more efficient way to create the average of every n-th row? My data looks a little bit like this.

set.seed(6218)
n <- 8760
s1 <- sample(30000:70000, n)
s2 <- sample(0:10000, n)
inDf <- cbind(s1, s2)

EDIT:

I call h_average like this: h_average(inDf, 24, 1, 1) This would mean that I average every first point of "every" 24 point subset. So the points 1, 25, 49, 73,... Also I only do this for the first column.

Thanks in advance, BenR

#' h_average
#' 
#' Computing the average of every first, second, third, ... hour of the day/week
#' 
#' @param data merged data
#' @param tstep hour-step representing the number of hours for a day/week
#' @param h hour, which should be averaged. Should be between 1 - 24/1 - 168.
#' @param x column number
#' @return mean average of the specific hour
h_average <- function(data, tstep, h, x) {
  sum_1 <- 0
  sum_2 <- 0
  mean  <- 0

  for (i in seq(h, nrow(data), tstep)){
    if(data[i,x]){
      sum_1 <- sum_1 + 1
      sum_2 <- sum_2 + data[i,x]
    }
  }
  mean <- sum_2/sum_1
  return(mean)
}

标签: r average
2条回答
家丑人穷心不美
2楼-- · 2020-05-01 10:27

If the question is how to reproduce h_average but without the loop then

1) colMeans Try this:

# assume inDf and h_average as defined in the question

tstep <- 24
h <- x <- 1

h_average(inDf, tstep, h, x)
##       s1 
## 49299.09 

# same but without loop
colMeans(inDf[seq(h, nrow(inDf), tstep), x, drop = FALSE])
##       s1 
## 49299.09 

This also works if x is a vector of column numbers, e.g. x = 1:2.

1a) This variation works too:

colMeans(inDf[seq_len(tstep) == h, x, drop = FALSE])

2) aggregate Another possibility is this:

aggregate(DF[x], list(h = gl(tstep, 1, nrow(inDf))), mean)[h, ]

which has the advantage that both x and h may be vectors, e.g.

x <- 1:2
h <- 1:3

DF <- as.data.frame(inDF)
aggregate(DF[x], list(h = gl(tstep, 1, nrow(inDf))), mean)[h, ]
##   h       s1       s2
## 1 1 49299.09 4964.277
## 2 2 49661.34 5177.910
## 3 3 49876.77 4946.447

To get all h then use h <- 1:tstep or just omit [h, ].

Note: InDf as defined in the question is a matrix and not a data frame as its name seems to suggest.

Update Some improvements in (1) and added (1a) and (2).

查看更多
对你真心纯属浪费
3楼-- · 2020-05-01 10:32

Just use a combination of rowMeans and subsetting. So something like:

n = 5
rowMeans(data[seq(1, nrow(data), n),])

Alternatively, you could use apply

## rowMeans is better, but 
## if you wanted to calculate the median (say)
## Just change mean to median below
apply(data[seq(1, nrow(data), n),], 1, mean)
查看更多
登录 后发表回答