mean-before-after imputation in R

2020-04-14 08:16发布

问题:

I'm new in R. My question is how to impute missing value using mean of before and after of the missing data point?

example;

using the mean from the upper and lower of each NA as the impute value.

-mean for row number 3 is 38.5

-mean for row number 7 is 32.5

age
52.0
27.0
NA
23.0
39.0
32.0
NA
33.0
43.0

Thank you.

回答1:

Here a solution using from na.locf from zoo package which replaces each NA with the most recent non-NA prior or posterior to it.

0.5*(na.locf(x,fromlast=TRUE) + na.locf(x))
[1] 52.0 27.0 25.0 23.0 39.0 32.0 32.5 33.0 43.0

the advantage here if you have more than one consecutive NA.

x <- c(52, 27, NA, 23, 39, NA, NA, 33, 43)
0.5*(na.locf(x,fromlast=TRUE) + na.locf(x))
[1] 52 27 25 23 39 36 36 33 43

EDIT rev argument is deprecated so I replace it by fromlast



回答2:

This would be a basic manual approach you can take:

age <- c(52, 27, NA, 23, 39, 32, NA, 33, 43)
age[is.na(age)] <- rowMeans(cbind(age[which(is.na(age))-1], 
                                  age[which(is.na(age))+1]))
age
# [1] 52.0 27.0 25.0 23.0 39.0 32.0 32.5 33.0 43.0

Or, since you seem to have a single column data.frame:

mydf <- data.frame(age = c(52, 27, NA, 23, 39, 32, NA, 33, 43))

mydf[is.na(mydf$age), ] <- rowMeans(
  cbind(mydf$age[which(is.na(mydf$age))-1],
        mydf$age[which(is.na(mydf$age))+1]))


回答3:

Just an other way:

age <- c(52, 27, NA, 23, 39, 32, NA, 33, 43)
age[is.na(age)] <- apply(sapply(which(is.na(age)), "+", c(-1, 1)), 2, 
                         function(x) mean(age[x]))
age
## [1] 52.0 27.0 25.0 23.0 39.0 32.0 32.5 33.0 43.0


回答4:

You are looking for Moving Average Imputation - you can use the na.ma function of imputeTS for this.

library(imputeTS)
x <- c(52, 27, NA, 23, 39, NA, NA, 33, 43)
na.ma(x, k=1, weighting = "simple")

[1] 52.00000 27.00000 25.00000 23.00000 39.00000 31.66667 38.33333 33.00000 43.00000

This produces exactly the required result. With the k parameter you specify how many neighbors on each side are taken into account for the calculation.