i am implementing a rolling sum calculation through dplyr, but in my database i have a number of variables that have only one or only a few observations, causing an (k is smaller than n) error. i have tried to resolve this in thisj example with filter and merge, but wondering if there is a way to do this more elegantly and automatically within dplyr. please see the example below
#create data
dg = expand.grid(site = c("Boston","New York"),
year = 2000:2004)
dg$animal="dog"
dg$animal[10]="cat";dg$animal=as.factor(dg$animal)
dg$count = rpois(dim(dg)[1], 5)
If i would run the code below, because i only have one row with "cat", one gets the (Error: k <= n is not true) error
#running average
dg2 = dg %>%
arrange(site,year,animal) %>%
group_by(site,animal) %>%
# filter(animal=="dog") %>%
mutate(roll_sum = rollsum(x = count, 2, align = "right", fill = NA))
i have tried to solve this by using the following code, which filters out the "cat" value and does a subsequent merge, but I was wondering whether one can do this directly inside dplyr, especially as in this solution one would have to specify / know the number of unique rows for each variable in advance and adjust manually if one would change the range of the rolling sum, etc.
dg2 = dg %>%
arrange(site,year,animal) %>%
group_by(site,animal) %>%
filter(animal=="dog") %>%
mutate(roll_sum = rollsum(x = count, 2, align = "right", fill = NA))
merge(dg,dg2,c("site", "year","animal","count"),all.x=TRUE)
site year animal count roll_sum
1 Boston 2000 dog 5 NA
2 Boston 2001 dog 6 11
3 Boston 2002 dog 6 12
4 Boston 2003 dog 5 11
5 Boston 2004 dog 3 8
6 New York 2000 dog 8 NA
7 New York 2001 dog 3 11
8 New York 2002 dog 12 15
9 New York 2003 dog 3 15
10 New York 2004 cat 3 NA
Many thanks - W