可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am new in R programming language. I just wanted to know is there any way to impute null values of just one column in our dataset. Because all of imputation commands and libraries that I have seen, impute null values of the whole dataset.

回答1:

Here is an example using the Hmisc package and impute

library(Hmisc)
DF <- data.frame(age = c(10, 20, NA, 40), sex = c('male','female'))

# impute with mean value

DF$imputed_age <- with(DF, impute(age, mean))

# impute with random value
DF$imputed_age2 <- with(DF, impute(age, 'random'))

# impute with the media
with(DF, impute(age, median))
# impute with the minimum
with(DF, impute(age, min))

# impute with the maximum
with(DF, impute(age, max))


# and if you are sufficiently foolish
# impute with number 7 
with(DF, impute(age, 7))

 # impute with letter 'a'
with(DF, impute(age, 'a'))

Look at ?impute for details on how the imputation is implemented

回答2:

Why not use more sophisticated imputation algorithms, such as mice (Multiple Imputation by Chained Equations)? Below is a code snippet in R you can adapt to your case.

library(mice)

#get the nhanes dataset
dat <- mice::nhanes

#impute it with mice
imp <- mice(mice::nhanes, m = 3, print=F)

imputed_dataset_1<-complete(imp,1)

head(imputed_dataset_1)

#     age  bmi hyp chl
# 1   1   22.5   1 118
# 2   2   22.7   1 187
# 3   1   30.1   1 187
# 4   3   24.9   1 186
# 5   1   20.4   1 113
# 6   3   20.4   1 184

#Now, let's see what methods have been used to impute each column
meth<-imp$method
#  age   bmi   hyp   chl
#"" "pmm" "pmm" "pmm"

#The age column is complete, so, it won't be imputed
# Columns bmi, hyp and chl are going to be imputed with pmm (predictive mean matching)

#Let's say that we want to impute only the "hyp" column
#So, we set the methods for the bmi and chl column to ""
meth[c(2,4)]<-""
#age   bmi   hyp   chl 
#""    "" "pmm"    "" 

#Let's run the mice imputation again, this time setting the methods parameter to our modified method
imp <- mice(mice::nhanes, m = 3, print=F, method = meth)

partly_imputed_dataset_1 <- complete(imp, 3)

head(partly_imputed_dataset_1)

#    age  bmi hyp chl
# 1   1   NA   1  NA
# 2   2 22.7   1 187
# 3   1   NA   1 187
# 4   3   NA   2  NA
# 5   1 20.4   1 113
# 6   3   NA   2 184

回答3:

There are plenty of packages that can do this for you. (a little more information about the data could help suggesting you the best options)

One example can be using the VIM package.

It has a function called kNN (k-nearest-neighbor imputation) This function has a option variable where you can specify which variables shall be imputed.

Here is an example:

library("VIM")
kNN(sleep, variable = c("NonD","Gest"))

The sleep dataset I used in this example comes along with VIM.

If there is some time dependency in your columns you want to impute using time series imputation packages could also make sense. In this case you could use for example the imputeTS package. Here is an example:

  library(imputeTS)
  na.kalman(tsAirgap)

The tsAirgap dataset used here as an example comes also along with imputeTS.

Imputation in R

问题:

回答1:

回答2:

回答3:

收藏的人(0)

Imputation in R

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮