Find and replace missing values with row mean

2019-01-18 13:00发布

I have a data frame with NAs and I want to replace the NAs with row means

c1 = c(1,2,3,NA)
c2 = c(3,1,NA,3)
c3 = c(2,1,3,1)

df = data.frame(c1,c2,c3)

> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3 NA  3
4 NA  3  1

so that

> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3  3  3
4  2  3  1

5条回答
乱世女痞
2楼-- · 2019-01-18 13:34

Very similar to @baptiste's answer

> ind <- which(is.na(df), arr.ind=TRUE)
> df[ind] <- rowMeans(df,  na.rm = TRUE)[ind[,1]]
查看更多
太酷不给撩
3楼-- · 2019-01-18 13:41

Another option is na.aggregate from library(zoo) after transposing the dataset

library(zoo)
df[] <- t(na.aggregate(t(df)))
df
#  c1 c2 c3
#1  1  3  2
#2  2  1  1
#3  3  3  3
#4  2  3  1
查看更多
Emotional °昔
4楼-- · 2019-01-18 13:43

My solution is

rwmns = rowMeans(df,na.rm=TRUE)
df$c1[is.na(df$c1)] = rwmns[is.na(df$c1)]
df$c2[is.na(df$c2)] = rwmns[is.na(df$c2)]
df$c3[is.na(df$c3)] = rwmns[is.na(df$c3)]
> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3  3  3
4  2  3  1

Is there a more elegant way, especially when someone has many columns?

查看更多
一夜七次
5楼-- · 2019-01-18 13:48

Using apply (note the returned object is a matrix):

t( apply( df , 1 , function(x) { x[ is.na(x) ] = mean( x , na.rm = TRUE ); x } ) )
     c1 c2 c3
[1,]  1  3  2
[2,]  2  1  1
[3,]  3  3  3
[4,]  2  3  1

We use any anonymous function to change the values of each NA in each row to the mean of that row. The only advantage is that you don't have to do any more typing if the number of rows increases. It is not particularly efficient or fast in a computational sense, but more so in a cognitive sense (you won't notice unless you have 000,000's of rows).

查看更多
\"骚年 ilove
6楼-- · 2019-01-18 13:51

I think this works,

df[which(is.na(df), arr.ind=TRUE)] <- rowMeans(df[!complete.cases(df), ], na.rm=TRUE)
查看更多
登录 后发表回答