I have a data frame with some columns with missing values. Is there a way (using dplyr) to efficiently calculate the percentage of each column that is missing i.e. NA. Sought of like a colSum equivalent. So I dont have to calculate each column percentage missing individually ?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
First, I created a test data for you:
a<- c(1,NA,NA,4)
b<- c(NA,2,3,4)
x<- data.frame(a,b)
x
# a b
# 1 1 NA
# 2 NA 2
# 3 NA 3
# 4 4 4
Then you can use colMeans(is.na(x))
:
colMeans(is.na(x))
# a b
# 0.50 0.25
回答2:
We can use summarise_each
library(dplyr)
x %>%
summarise_each(funs(100*mean(is.na(.))))
回答3:
Loving the concision of purrr::map
for this type of thing:
x %>% map(~ mean(is.na(.)))