可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a data frame on which I calculate a run length encoding for a specific column. The values of the column, dir, are either -1, 0, or 1.

dir.rle <- rle(df$dir)

I then take the run lengths and compute segmented cumulative sums across another column in the data frame. I'm using a for loop, but I feel like there should be a way to do this more intelligently.

ndx <- 1
for(i in 1:length(dir.rle$lengths)) {
    l <- dir.rle$lengths[i] - 1
    s <- ndx
    e <- ndx+l
    tmp[s:e,]$cumval <- cumsum(df[s:e,]$val)
    ndx <- e + 1
}

The run lengths of dir define the start, s, and end, e, for each run. The above code works but it does not feel like idiomatic R code. I feel as if there should be another way to do it without the loop.

回答1:

This can be broken down into a two step problem. First, if we create an indexing column based off of the rle, then we can use that to group by and run the cumsum. The group by can then be performed by any number of aggregation techniques. I'll show two options, one using data.table and the other using plyr.

library(data.table)
library(plyr)
#data.table is the same thing as a data.frame for most purposes
#Fake data
dat <- data.table(dir = sample(-1:1, 20, TRUE), value = rnorm(20))
dir.rle <- rle(dat$dir)
#Compute an indexing column to group by
dat <- transform(dat, indexer = rep(1:length(dir.rle$lengths), dir.rle$lengths))


#What does the indexer column look like?
> head(dat)
     dir      value indexer
[1,]   1  0.5045807       1
[2,]   0  0.2660617       2
[3,]   1  1.0369641       3
[4,]   1 -0.4514342       3
[5,]  -1 -0.3968631       4
[6,]  -1 -2.1517093       4


#data.table approach
dat[, cumsum(value), by = indexer]

#plyr approach
ddply(dat, "indexer", summarize, V1 = cumsum(value))

回答2:

Both Spacedman & Chase make the key point that a grouping variable simplifies everything (and Chase lays out two nice ways to proceed from there).

I'll just throw in an alternative approach to forming that grouping variable. It doesn't use rle and, at least to me, feels more intuitive. Basically, at each point where diff() detects a change in value, the cumsum that will form your grouping variable is incremented by one:

df$group <- c(0, cumsum(!(diff(df$dir)==0)))

# Or, equivalently
df$group <- c(0, cumsum(as.logical(diff(df$dir))))

回答3:

Add a 'group' column to the data frame. Something like:

df=data.frame(z=rnorm(100)) # dummy data
df$dir = sign(df$z) # dummy +/- 1
rl = rle(df$dir)
df$group = rep(1:length(rl$lengths),times=rl$lengths)

then use tapply to sum within groups:

tapply(df$z,df$group,sum)

Cumulative sums over run lengths. Can this loop be

问题:

回答1:

回答2:

回答3:

收藏的人(0)

Cumulative sums over run lengths. Can this loop be

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮