I am trying to do a bootstrapped correlation in R. I have two variables Var1 and Var2 and I want to get the bootstrapped p.value of the Pearson correlation.
my variables look like this:
x y
1 .6080522 1.707642
2 1.4307273 1.772616
3 0.8226198 1.768537
4 1.7714221 1.265276
5 1.5986213 1.855719
6 1.0000000 1.606106
7 1.1678940 1.671457
8 0.6630012 1.608428
9 1.0842423 1.670619
10 0.5592512 1.107783
11 1.6442616 1.492832
12 0.8326965 1.643923
13 1.1696954 1.763181
14 0.7484543 1.762921
15 1.0842423 1.591566
16 0.9014748 1.718669
17 0.7604917 1.782863
18 0.8566499 1.796216
19 1.4307273 1.913675
20 1.7579695 1.903155
So far I have this:
data = as.data.frame(data)
x = data$Var1
y = data$Var2
dat = data.frame(x,y)
library(boot)
set.seed(1)
bootCorTest3 <- function(data, i){
d <- data[i, ]
results <- cor.test(d$x, d$y, method='pearson')
c(est = results$estimate, stat = results$statistic, param = results$parameter, p.value = results$p.value, CI = results$conf.int)
}
b3 <- boot(dat, bootCorTest3, R = 1000)
b3
# Original (non-bootstrap) statistics with label
b3$t0
colMeans(b3$t)
boot.ci(b3, type = c("norm", "basic", "perc", "bca")) #bootstrapped CI.
The bootstrapped p value should be the one I get with colMeans(b3$t), right?
colMeans(b3$t) gives me this:
est.cor stat.t param.df p.value CI1 CI2
0.28495324 2.13981008 48.00000000 0.14418623 0.01438146 0.51726022
It seems like everything is working fine. The problem is that I ran the same statistics on a different software and the results are widely different. The p-value I get here is way higher than on the other. I think that I may have done something wrong here as I am not strong in R.
Can anyone give me some feedback on this code? Am I doing something wrong? Ho would you get the bootstrapped p.value for the Pearson Correlation?
Thank you for your time.
If you want to bootstrap your correlation test, you only need to return the correlation coefficient from your bootstrap statistic function. Bootstrapping the p-value of the correlation test is not appropriate in this case, as you ignore the directionality of the correlation test.
Check this question on CrossValidated for some nice answers on performing bootstrap hypothesis tests: https://stats.stackexchange.com/questions/20701/computing-p-value-using-bootstrap-with-r
In this case without a p-value it's quite safe to say that most of the mass of the sampling distribution is very close to zero.