I am trying following code to produce confidence intervals of principal component analysis using resampling with replacement (like bootstrap). I am using first 4 columns of iris dataset:
The prcomp function produces following output:
> mydf = iris[1:4]
> print(prcomp(mydf))
Standard deviations:
[1] 2.0562689 0.4926162 0.2796596 0.1543862
Rotation:
PC1 PC2 PC3 PC4
Sepal.Length 0.36138659 -0.65658877 0.58202985 0.3154872
Sepal.Width -0.08452251 -0.73016143 -0.59791083 -0.3197231
Petal.Length 0.85667061 0.17337266 -0.07623608 -0.4798390
Petal.Width 0.35828920 0.07548102 -0.54583143 0.7536574
Using resampling with replacement:
> times = 1000
> ll = list()
> for(i in 1:times) {
+ tempdf = mydf[sample(nrow(mydf), replace = TRUE), ]
+ ll[[length(ll)+1]] = prcomp(tempdf)$rotation
+ }
>
> dd = data.frame(apply(simplify2array(ll), 1:2, mean))
> print(dd)
PC1 PC2 PC3 PC4
Sepal.Length 0.005574165 -0.039480258 0.044537991 0.007778055
Sepal.Width -0.002587333 -0.040273812 -0.050793200 -0.005473271
Petal.Length 0.015681233 0.010952361 -0.005769051 -0.011351172
Petal.Width 0.006513656 0.008296928 -0.041805210 0.019109323
Determining lower confidence interval:
> ddlower = data.frame(apply(simplify2array(ll), 1:2, quantile, probs=0.025))
> print(ddlower)
PC1 PC2 PC3 PC4
Sepal.Length -0.3859257 -0.7274809 -0.6560139 -0.3807826
Sepal.Width -0.1127749 -0.7907801 -0.6818251 -0.3941001
Petal.Length -0.8633386 -0.2058064 -0.1333520 -0.4919584
Petal.Width -0.3702979 -0.1328146 -0.6203322 -0.8088710
Determining upper confidence interval:
> ddupper = data.frame(apply(simplify2array(ll), 1:2, quantile, probs=0.975))
> print(ddupper)
PC1 PC2 PC3 PC4
Sepal.Length 0.3860431 0.7250412 0.6632126 0.3831889
Sepal.Width 0.1111863 0.7993649 0.6758156 0.3987939
Petal.Length 0.8638549 0.2106540 0.1318556 0.4915670
Petal.Width 0.3721362 0.1510708 0.6246988 0.8083421
I find that the loading values are very different. Moreover, the confidence intervals are around 0 for all the variables and components. I checked with other (large) datasets also and the findings are very similar. From these confidence intervals none of the loadings are significantly different from 0. There is obviously some error in the code but I cannot seem to find it. Thanks for your help.