Replace missing values with column mean

2019-01-04 10:20发布

I am not sure how to loop over each column to replace the NA values with the column mean. When I am trying to replace for one column using the following, it works well.

Column1[is.na(Column1)] <- round(mean(Column1, na.rm = TRUE))

The code for looping over columns is not working:

for(i in 1:ncol(data)){
    data[i][is.na(data[i])] <- round(mean(data[i], na.rm = TRUE))
}

the values are not replaced. Can someone please help me with this?

9条回答
淡お忘
2楼-- · 2019-01-04 11:11

If DF is your data frame of numeric columns:

library(zoo)
na.aggregate(DF)

ADDED:

Using only the base of R define a function which does it for one column and then lapply to every column:

NA2mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
replace(DF, TRUE, lapply(DF, NA2mean))

The last line could be replaced with the following if it's OK to overwrite the input:

DF[] <- lapply(DF, NA2mean)
查看更多
爷、活的狠高调
3楼-- · 2019-01-04 11:11

lapply can be used instead of a for loop.

d1[] <- lapply(d1, function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x))

This doesn't really have any advantages over the for loop, though maybe it's easier if you have non-numeric columns as well, in which case

d1[sapply(d1, is.numeric)] <- lapply(d1[sapply(d1, is.numeric)], function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x))

is almost as easy.

查看更多
够拽才男人
4楼-- · 2019-01-04 11:13

There is also quick solution using the imputeTS package:

library(imputeTS)
na.mean(yourDataFrame)
查看更多
登录 后发表回答