I'm new in R. My question is how to impute missing value using mean of before and after of the missing data point?
example;
using the mean from the upper and lower of each NA as the impute value.
-mean for row number 3 is 38.5
-mean for row number 7 is 32.5
age
52.0
27.0
NA
23.0
39.0
32.0
NA
33.0
43.0
Thank you.
Here a solution using from na.locf
from zoo
package which replaces each NA with the most recent non-NA prior or posterior to it.
0.5*(na.locf(x,fromlast=TRUE) + na.locf(x))
[1] 52.0 27.0 25.0 23.0 39.0 32.0 32.5 33.0 43.0
the advantage here if you have more than one consecutive NA.
x <- c(52, 27, NA, 23, 39, NA, NA, 33, 43)
0.5*(na.locf(x,fromlast=TRUE) + na.locf(x))
[1] 52 27 25 23 39 36 36 33 43
EDIT
rev
argument is deprecated so I replace it by fromlast
This would be a basic manual approach you can take:
age <- c(52, 27, NA, 23, 39, 32, NA, 33, 43)
age[is.na(age)] <- rowMeans(cbind(age[which(is.na(age))-1],
age[which(is.na(age))+1]))
age
# [1] 52.0 27.0 25.0 23.0 39.0 32.0 32.5 33.0 43.0
Or, since you seem to have a single column data.frame
:
mydf <- data.frame(age = c(52, 27, NA, 23, 39, 32, NA, 33, 43))
mydf[is.na(mydf$age), ] <- rowMeans(
cbind(mydf$age[which(is.na(mydf$age))-1],
mydf$age[which(is.na(mydf$age))+1]))
Just an other way:
age <- c(52, 27, NA, 23, 39, 32, NA, 33, 43)
age[is.na(age)] <- apply(sapply(which(is.na(age)), "+", c(-1, 1)), 2,
function(x) mean(age[x]))
age
## [1] 52.0 27.0 25.0 23.0 39.0 32.0 32.5 33.0 43.0
You are looking for Moving Average Imputation - you can use the na.ma function of imputeTS for this.
library(imputeTS)
x <- c(52, 27, NA, 23, 39, NA, NA, 33, 43)
na.ma(x, k=1, weighting = "simple")
[1] 52.00000 27.00000 25.00000 23.00000 39.00000 31.66667 38.33333
33.00000 43.00000
This produces exactly the required result.
With the k parameter you specify how many neighbors on each side are taken into account for the calculation.