linear interpolation with dplyr but skipping group

2020-03-30 00:23发布

问题:

I'm trying to linearly interpolate values within a group using dplyr and approx() Unfortunately, some of the groups have all missing values, so I'd like the approximation to just skip those groups and proceed for the remainder. I don't want to extrapolate or using the nearest neighbouring observation's data.

Here's an example of the data. The first group (by id) has all missing, the other should be interpolated.

data <- read.csv(text="
id,year,value
c1,1998,NA
c1,1999,NA
c1,2000,NA
c1,2001,NA
c2,1998,14
c2,1999,NA
c2,2000,NA
c2,2001,18")

dataIpol <- data %>%
group_by(id) %>% 
arrange(id, year) %>%            
mutate(valueIpol = approx(year, value, year, 
                 method = "linear", rule = 1, f = 0, ties = mean)$y)

But then I get the error

Error: need at least two non-NA values to interpolate

I don't get this error if I get rid of the groups that have all missing but that's not feasible.

回答1:

We can fix this by adding a filter step with the required number of data points:

library(dplyr)
dataIpol <- data %>%
  group_by(id) %>% 
  arrange(id, year) %>%
  filter(sum(!is.na(value))>=2) %>% #filter!
  mutate(valueIpol = approx(year, value, year, 
                            method = "linear", rule = 1, f = 0, ties = mean)$y)

Here we sum the number of non-NA items in the value column, and remove any groups that do not have >=2.