可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

i am implementing a rolling sum calculation through dplyr, but in my database i have a number of variables that have only one or only a few observations, causing an (k is smaller than n) error. i have tried to resolve this in thisj example with filter and merge, but wondering if there is a way to do this more elegantly and automatically within dplyr. please see the example below

    #create data
    dg = expand.grid(site = c("Boston","New York"),
                     year = 2000:2004)
    dg$animal="dog"
    dg$animal[10]="cat";dg$animal=as.factor(dg$animal)
    dg$count = rpois(dim(dg)[1], 5)

If i would run the code below, because i only have one row with "cat", one gets the (Error: k <= n is not true) error

#running average
dg2 = dg %>%
  arrange(site,year,animal) %>%
  group_by(site,animal) %>%
#   filter(animal=="dog") %>%
  mutate(roll_sum = rollsum(x = count, 2, align = "right", fill = NA))

i have tried to solve this by using the following code, which filters out the "cat" value and does a subsequent merge, but I was wondering whether one can do this directly inside dplyr, especially as in this solution one would have to specify / know the number of unique rows for each variable in advance and adjust manually if one would change the range of the rolling sum, etc.

dg2 = dg %>%
  arrange(site,year,animal) %>%
  group_by(site,animal) %>%
  filter(animal=="dog") %>%
  mutate(roll_sum = rollsum(x = count, 2, align = "right", fill = NA))

merge(dg,dg2,c("site", "year","animal","count"),all.x=TRUE)

       site year animal count roll_sum
1    Boston 2000    dog     5       NA
2    Boston 2001    dog     6       11
3    Boston 2002    dog     6       12
4    Boston 2003    dog     5       11
5    Boston 2004    dog     3        8
6  New York 2000    dog     8       NA
7  New York 2001    dog     3       11
8  New York 2002    dog    12       15
9  New York 2003    dog     3       15
10 New York 2004    cat     3       NA

Many thanks - W

回答1:

You can instead use RcppRoll::roll_sum which returns NA if the sample size(n) is less than the window size(k).

set.seed(1)
dg$count = rpois(dim(dg)[1], 5) 
library(RcppRoll)
library(dplyr)
dg %>%
     arrange(site,year,animal) %>%
     group_by(site, animal) %>%
     mutate(roll_sum = roll_sum(count, 2, align = "right", fill = NA))    
#       site year animal count roll_sum
#1    Boston 2000    dog     4       NA
#2    Boston 2001    dog     5        9
#3    Boston 2002    dog     3        8
#4    Boston 2003    dog     9       12
#5    Boston 2004    dog     6       15
#6  New York 2000    dog     4       NA
#7  New York 2001    dog     8       12
#8  New York 2002    dog     8       16
#9  New York 2003    dog     6       14
#10 New York 2004    cat     2       NA

回答2:

roll_Sum from RcppRoll will return an NA in place of an error wherever the number of data points are the less than the window size.

However, in case you want to return the sum of the number of data points present - even if lesser than the window the size, you can use the rollapplyr function from zoo.

library(zoo)
library(dplyr)

   dg %>%
         arrange(site,year,animal) %>%
         group_by(site, animal) %>%
         mutate(roll_sum = roll_sum(count, 2, align = "right", fill = NA)) %>%
         mutate(rollapply_sum =rollapplyr(count, 2, sum, partial = TRUE) )

Rollapply_sum will return the original value or the sum of data points present, even if its less than the window size instead of returning NA.

       site  year animal count roll_sum rollapply_sum
     (fctr) (int) (fctr) (int)    (dbl)         (int)
1    Boston  2000    dog     4       NA             4
2    Boston  2001    dog     5        9             9
3    Boston  2002    dog     3        8             8
4    Boston  2003    dog     9       12            12
5    Boston  2004    dog     6       15            15
6  New York  2000    dog     4       NA             4
7  New York  2001    dog     8       12            12
8  New York  2002    dog     8       16            16
9  New York  2003    dog     6       14            14
10 New York  2004    cat     2       NA             2

回答3:

 library(dplyr)
 dg %>% 
   arrange(site,year,animal) %>% 
   group_by(site,animal) %>% 
   mutate(rollsum=cumsum(count))

R dplyr rolling sum

问题:

回答1:

回答2:

回答3:

收藏的人(0)

R dplyr rolling sum

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮