MICE does not impute certain columns, but also doe

2020-04-07 02:33发布

I know that similar questions have been asked before (e.g., 1, 2, 3), but I still can not understand the reason why MICE is failing to predict missing values even when I try unconditioned mean like in the example 1.

The sparse matrix I have is :

            k1    k3       k5       k6       k7       k8      k11      k12      k13      k14      k15
 [1,]       NA    NA       NA       NA       NA       NA       NA       NA       NA       NA 0.066667
 [2,] 0.909091    NA       NA       NA       NA 0.944723       NA       NA 0.545455       NA       NA
 [3,] 0.545455    NA       NA       NA       NA       NA       NA       NA 0.818182 0.800000 0.466667
 [4,] 0.545455    NA 0.642857       NA       NA 0.260954       NA       NA       NA       NA       NA
 [5,]       NA 0.750 0.500000       NA 0.869845       NA 0.595013       NA       NA       NA       NA
 [6,] 0.727273 0.625       NA 0.583333       NA       NA       NA 0.500000 0.545455       NA       NA
 [7,]       NA    NA 0.571429       NA       NA       NA       NA       NA       NA       NA 0.866667
 [8,] 0.545455    NA       NA       NA       NA 0.905593 0.677757       NA       NA       NA       NA
 [9,]       NA 0.999 0.714286 0.750000       NA       NA 0.881032       NA       NA 0.933333 0.733333
[10,]       NA 0.750       NA       NA       NA       NA       NA       NA 0.545455       NA       NA
[11,]       NA    NA       NA       NA       NA       NA       NA       NA 0.818182       NA       NA
[12,]       NA 0.999       NA 0.583333       NA       NA 0.986145 0.666667 0.909091       NA       NA
[13,] 0.818182    NA 0.857143 0.583333 0.001000       NA       NA       NA       NA 0.133333       NA
[14,]       NA 0.999 0.357143       NA 0.635087       NA       NA       NA       NA       NA       NA
[15,]       NA 0.750 0.857143 0.250000 0.742082 0.001000 0.001000       NA 0.636364       NA 0.533333
[16,]       NA 0.999       NA 0.250000       NA       NA       NA       NA 0.909091       NA       NA
[17,] 0.727273 0.999 0.001000       NA       NA       NA 0.886366 0.666667 0.909091 0.800000 0.933333
[18,]       NA    NA 0.571429       NA       NA 0.953382       NA 0.833333 0.727273       NA       NA
[19,]       NA    NA       NA       NA 0.661476       NA       NA 0.500000       NA 0.933333 0.600000
[20,]       NA    NA 0.857143       NA 0.661661 0.459014 0.283793       NA       NA       NA       NA
[21,]       NA    NA       NA       NA       NA       NA       NA       NA       NA       NA 0.800000
[22,] 0.454545    NA       NA       NA       NA       NA       NA 0.333333 0.727273       NA 0.533333
[23,]       NA    NA       NA 0.333333 0.790737       NA       NA       NA 0.727273 0.433333       NA
[24,]       NA 0.875       NA       NA       NA       NA       NA       NA       NA 0.999000       NA
[25,]       NA    NA 0.571429 0.583333       NA       NA 0.196147 0.500000       NA       NA       NA
[26,]       NA 0.999 0.642857 0.250000       NA       NA       NA       NA 0.636364 0.700000       NA
[27,]       NA    NA 0.714286       NA       NA       NA       NA       NA       NA       NA       NA
[28,]       NA 0.875       NA 0.500000       NA       NA       NA       NA       NA       NA 0.666667
[29,] 0.636364 0.750       NA       NA       NA 0.999000 0.999000       NA       NA       NA       NA
[30,] 0.727273    NA       NA       NA 0.916098 0.734748       NA       NA       NA 0.833333       NA
[31,]       NA    NA       NA       NA       NA       NA       NA       NA       NA       NA 0.733333
[32,]       NA 0.875       NA 0.500000       NA       NA       NA       NA 0.818182       NA       NA
[33,] 0.636364    NA       NA       NA       NA       NA 0.829819       NA 0.727273       NA 0.733333
[34,]       NA    NA 0.500000       NA       NA       NA       NA       NA       NA       NA 0.666667
[35,]       NA    NA 0.214286       NA       NA 0.529592       NA 0.001000 0.909091       NA       NA
[36,]       NA    NA       NA 0.416667 0.808369       NA       NA 0.500000 0.909091 0.633333 0.733333
[37,]       NA    NA 0.357143       NA       NA 0.837555 0.755077       NA 0.818182       NA       NA
[38,]       NA    NA       NA 0.166667 0.841643 0.364216       NA       NA       NA 0.733333       NA
[39,]       NA    NA 0.500000 0.750000       NA       NA       NA       NA 0.818182 0.999000 0.800000
[40,]       NA    NA       NA       NA 0.931836       NA       NA       NA       NA       NA 0.133333
[41,]       NA    NA 0.714286       NA       NA 0.848688       NA       NA       NA       NA       NA
[42,]       NA    NA 0.214286 0.333333 0.700812 0.208412       NA 0.333333       NA       NA       NA
[43,] 0.454545    NA       NA       NA 0.109326 0.346767 0.877241 0.833333       NA       NA       NA
[44,] 0.818182    NA 0.857143       NA       NA 0.931636       NA       NA       NA 0.733333       NA
[45,] 0.363636 0.750       NA       NA       NA       NA       NA 0.166667 0.818182       NA       NA
[46,]       NA    NA 0.785714       NA 0.738672       NA       NA       NA       NA 0.100000       NA
[47,] 0.181818    NA       NA       NA       NA       NA       NA       NA       NA       NA 0.001000
[48,]       NA    NA 0.001000 0.083333 0.308050 0.139592       NA 0.166667       NA       NA       NA
[49,]       NA    NA       NA       NA 0.561841 0.817696       NA 0.666667       NA 0.300000       NA
[50,]       NA    NA       NA 0.416667       NA       NA       NA       NA 0.545455       NA 0.866667
[51,]       NA 0.875       NA       NA 0.039781       NA       NA       NA       NA 0.933333       NA
[52,]       NA    NA 0.357143       NA       NA       NA       NA 0.333333       NA       NA       NA
[53,]       NA 0.999       NA       NA       NA 0.835015       NA       NA       NA 0.833333 0.666667
[54,]       NA 0.750       NA 0.416667       NA       NA 0.623528 0.333333 0.818182       NA       NA
[55,]       NA    NA       NA 0.666667       NA 0.878312       NA       NA       NA       NA       NA                                                      

And I apply the following standard mice function

res<-mice(Sparse_Data,maxit = 30,meth='mean',seed = 500,print=FALSE)
t<-complete(res, action="long",TRUE) #all theestimations in 10 itterations
out <- split( t , f = t$.imp )[-1] 
a<-Reduce("+", out)/length(out)
data_Pred<-a[,3:ncol(a)]

The predicted matrix I get is:

           k1        k3        k5        k6        k7        k8      k11       k12       k13       k14      k15
56  0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629       NA 0.4583958 0.7561986 0.6959606 0.066667
57  0.9090910 0.8676667 0.5373542 0.4429824 0.6069598 0.9447230       NA 0.4583958 0.5454550 0.6959606       NA
58  0.5454550 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629       NA 0.4583958 0.8181820 0.8000000 0.466667
59  0.5454550 0.8676667 0.6428570 0.4429824 0.6069598 0.2609540       NA 0.4583958 0.7561986 0.6959606       NA
60  0.6060607 0.7500000 0.5000000 0.4429824 0.8698450 0.6313629 0.595013 0.4583958 0.7561986 0.6959606       NA
61  0.7272730 0.6250000 0.5373542 0.5833330 0.6069598 0.6313629       NA 0.5000000 0.5454550 0.6959606       NA
62  0.6060607 0.8676667 0.5714290 0.4429824 0.6069598 0.6313629       NA 0.4583958 0.7561986 0.6959606 0.866667
63  0.5454550 0.8676667 0.5373542 0.4429824 0.6069598 0.9055930 0.677757 0.4583958 0.7561986 0.6959606       NA
64  0.6060607 0.9990000 0.7142860 0.7500000 0.6069598 0.6313629 0.881032 0.4583958 0.7561986 0.9333330 0.733333
65  0.6060607 0.7500000 0.5373542 0.4429824 0.6069598 0.6313629       NA 0.4583958 0.5454550 0.6959606       NA
66  0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629       NA 0.4583958 0.8181820 0.6959606       NA
67  0.6060607 0.9990000 0.5373542 0.5833330 0.6069598 0.6313629 0.986145 0.6666670 0.9090910 0.6959606       NA
68  0.8181820 0.8676667 0.8571430 0.5833330 0.0010000 0.6313629       NA 0.4583958 0.7561986 0.1333330       NA
69  0.6060607 0.9990000 0.3571430 0.4429824 0.6350870 0.6313629       NA 0.4583958 0.7561986 0.6959606       NA
70  0.6060607 0.7500000 0.8571430 0.2500000 0.7420820 0.0010000 0.001000 0.4583958 0.6363640 0.6959606 0.533333
71  0.6060607 0.9990000 0.5373542 0.2500000 0.6069598 0.6313629       NA 0.4583958 0.9090910 0.6959606       NA
72  0.7272730 0.9990000 0.0010000 0.4429824 0.6069598 0.6313629 0.886366 0.6666670 0.9090910 0.8000000 0.933333
73  0.6060607 0.8676667 0.5714290 0.4429824 0.6069598 0.9533820       NA 0.8333330 0.7272730 0.6959606       NA
74  0.6060607 0.8676667 0.5373542 0.4429824 0.6614760 0.6313629       NA 0.5000000 0.7561986 0.9333330 0.600000
75  0.6060607 0.8676667 0.8571430 0.4429824 0.6616610 0.4590140 0.283793 0.4583958 0.7561986 0.6959606       NA
76  0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629       NA 0.4583958 0.7561986 0.6959606 0.800000
77  0.4545450 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629       NA 0.3333330 0.7272730 0.6959606 0.533333
78  0.6060607 0.8676667 0.5373542 0.3333330 0.7907370 0.6313629       NA 0.4583958 0.7272730 0.4333330       NA
79  0.6060607 0.8750000 0.5373542 0.4429824 0.6069598 0.6313629       NA 0.4583958 0.7561986 0.9990000       NA
80  0.6060607 0.8676667 0.5714290 0.5833330 0.6069598 0.6313629 0.196147 0.5000000 0.7561986 0.6959606       NA
81  0.6060607 0.9990000 0.6428570 0.2500000 0.6069598 0.6313629       NA 0.4583958 0.6363640 0.7000000       NA
82  0.6060607 0.8676667 0.7142860 0.4429824 0.6069598 0.6313629       NA 0.4583958 0.7561986 0.6959606       NA
83  0.6060607 0.8750000 0.5373542 0.5000000 0.6069598 0.6313629       NA 0.4583958 0.7561986 0.6959606 0.666667
84  0.6363640 0.7500000 0.5373542 0.4429824 0.6069598 0.9990000 0.999000 0.4583958 0.7561986 0.6959606       NA
85  0.7272730 0.8676667 0.5373542 0.4429824 0.9160980 0.7347480       NA 0.4583958 0.7561986 0.8333330       NA
86  0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629       NA 0.4583958 0.7561986 0.6959606 0.733333
87  0.6060607 0.8750000 0.5373542 0.5000000 0.6069598 0.6313629       NA 0.4583958 0.8181820 0.6959606       NA
88  0.6363640 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 0.829819 0.4583958 0.7272730 0.6959606 0.733333
89  0.6060607 0.8676667 0.5000000 0.4429824 0.6069598 0.6313629       NA 0.4583958 0.7561986 0.6959606 0.666667
90  0.6060607 0.8676667 0.2142860 0.4429824 0.6069598 0.5295920       NA 0.0010000 0.9090910 0.6959606       NA
91  0.6060607 0.8676667 0.5373542 0.4166670 0.8083690 0.6313629       NA 0.5000000 0.9090910 0.6333330 0.733333
92  0.6060607 0.8676667 0.3571430 0.4429824 0.6069598 0.8375550 0.755077 0.4583958 0.8181820 0.6959606       NA
93  0.6060607 0.8676667 0.5373542 0.1666670 0.8416430 0.3642160       NA 0.4583958 0.7561986 0.7333330       NA
94  0.6060607 0.8676667 0.5000000 0.7500000 0.6069598 0.6313629       NA 0.4583958 0.8181820 0.9990000 0.800000
95  0.6060607 0.8676667 0.5373542 0.4429824 0.9318360 0.6313629       NA 0.4583958 0.7561986 0.6959606 0.133333
96  0.6060607 0.8676667 0.7142860 0.4429824 0.6069598 0.8486880       NA 0.4583958 0.7561986 0.6959606       NA
97  0.6060607 0.8676667 0.2142860 0.3333330 0.7008120 0.2084120       NA 0.3333330 0.7561986 0.6959606       NA
98  0.4545450 0.8676667 0.5373542 0.4429824 0.1093260 0.3467670 0.877241 0.8333330 0.7561986 0.6959606       NA
99  0.8181820 0.8676667 0.8571430 0.4429824 0.6069598 0.9316360       NA 0.4583958 0.7561986 0.7333330       NA
100 0.3636360 0.7500000 0.5373542 0.4429824 0.6069598 0.6313629       NA 0.1666670 0.8181820 0.6959606       NA
101 0.6060607 0.8676667 0.7857140 0.4429824 0.7386720 0.6313629       NA 0.4583958 0.7561986 0.1000000       NA
102 0.1818180 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629       NA 0.4583958 0.7561986 0.6959606 0.001000
103 0.6060607 0.8676667 0.0010000 0.0833330 0.3080500 0.1395920       NA 0.1666670 0.7561986 0.6959606       NA
104 0.6060607 0.8676667 0.5373542 0.4429824 0.5618410 0.8176960       NA 0.6666670 0.7561986 0.3000000       NA
105 0.6060607 0.8676667 0.5373542 0.4166670 0.6069598 0.6313629       NA 0.4583958 0.5454550 0.6959606 0.866667
106 0.6060607 0.8750000 0.5373542 0.4429824 0.0397810 0.6313629       NA 0.4583958 0.7561986 0.9333330       NA
107 0.6060607 0.8676667 0.3571430 0.4429824 0.6069598 0.6313629       NA 0.3333330 0.7561986 0.6959606       NA
108 0.6060607 0.9990000 0.5373542 0.4429824 0.6069598 0.8350150       NA 0.4583958 0.7561986 0.8333330 0.666667
109 0.6060607 0.7500000 0.5373542 0.4166670 0.6069598 0.6313629 0.623528 0.3333330 0.8181820 0.6959606       NA
110 0.6060607 0.8676667 0.5373542 0.6666670 0.6069598 0.8783120       NA 0.4583958 0.7561986 0.6959606       NA                                  

Maybe someone can shed some light on the problem?

标签: r r-mice
1条回答
闹够了就滚
2楼-- · 2020-04-07 03:08

Ok, so here's the deal... mice relies on its PredictionMatrix. This is a matrix that is used to determine from which columns the missing values of each variable are predicted. If a column is empty, then that variable will not be predicted, regardless of what method you specify.

You can check this matrix by running mice and then typing res$pred. As you can see, the columns for k11 and k15 are empty and therefore they aren't imputed. Purely as an example (NOT A SOLUTION), try specifying mice(pred = diag(ncol(Sparse_Data)), ...). You'll see that now it works. [Edit: For future readers: this is not a way to SOLVE the problem, just to show where the problem is.]

So why does mice make those two columns empty? Well, I tried looking into the source code of mice... Within it, there is a function called check.data. Within that, there is a call to find.collinear, which in turn will specify which variables are collinear, which will then be removed in subsequent steps.

Are any of your columns collinear? Well, yes:

cor(Sparse_Data, use = "pairwise.complete.obs")
            k1            k3          k5            k6          k7           k8        k11        k12          k13         k14         k15
k1   1.0000000  1.740412e-01  0.24932705            NA  0.17164319  0.640984131  0.3053596  0.4225772 -0.536055739 -0.50460872  0.97321365
k3   0.1740412  1.000000e+00 -0.42409199 -9.370804e-05 -0.38583663  0.361416106  0.5515156  0.6567106  0.634250161 -0.70631658  0.74001342
k5   0.2493271 -4.240920e-01  1.00000000  4.471829e-01  0.02679894  0.234850334 -0.6624768  0.4201946 -0.924517670 -0.45408744 -0.78628746
k6          NA -9.370804e-05  0.44718290  1.000000e+00 -0.35377747  0.818644775  0.6824749  0.8899878  0.147657537  0.27030472  0.49159991
k7   0.1716432 -3.858366e-01  0.02679894 -3.537775e-01  1.00000000  0.207791538 -0.6406942 -0.2863018  0.898687181  0.14987951 -0.70210859
k8   0.6409841  3.614161e-01  0.23485033  8.186448e-01  0.20779154  1.000000000  0.7491736  0.5219197  0.002468839 -0.13067177  1.00000000
k11  0.3053596  5.515156e-01 -0.66247684  6.824749e-01 -0.64069422  0.749173578  1.0000000  0.5925582  0.830372468 -1.00000000  0.83452358
k12  0.4225772  6.567106e-01  0.42019459  8.899878e-01 -0.28630180  0.521919747  0.5925582  1.0000000 -0.134937885 -0.49251775  0.92582043
k13 -0.5360557  6.342502e-01 -0.92451767  1.476575e-01  0.89868718  0.002468839  0.8303725 -0.1349379  1.000000000  0.29508347  0.13853862
k14 -0.5046087 -7.063166e-01 -0.45408744  2.703047e-01  0.14987951 -0.130671767 -1.0000000 -0.4925177  0.295083470  1.00000000  0.02558161
k15  0.9732137  7.400134e-01 -0.78628746  4.915999e-01 -0.70210859  1.000000000  0.8345236  0.9258204  0.138538625  0.02558161  1.00000000

As you can see, k11 is perfectly correlated with k14, and k15 with k8. This is why they get kicked out.

So, there are two solutions... either make sure that there are no perfectly correlated pairs in your matrix, or in this case just provide PredictionMatrix yourself.

Edit: To further prove my point.. Try running this code before your code and you'll see that it indeed works:

Sparse_Data$k11[1] <- 2
Sparse_Data$k15[1] <- 2
Sparse_Data$k8[1] <- 0.5
Sparse_Data$k14[1] <- 0.5
查看更多
登录 后发表回答