Bootstrapped correlation in R

I am trying to do a bootstrapped correlation in R. I have two variables Var1 and Var2 and I want to get the bootstrapped p.value of the Pearson correlation.

my variables look like this:
      x            y
1   .6080522    1.707642
2   1.4307273   1.772616
3   0.8226198   1.768537
4   1.7714221   1.265276
5   1.5986213   1.855719
6   1.0000000   1.606106
7   1.1678940   1.671457
8   0.6630012   1.608428
9   1.0842423   1.670619
10  0.5592512   1.107783
11  1.6442616   1.492832
12  0.8326965   1.643923
13  1.1696954   1.763181
14  0.7484543   1.762921
15  1.0842423   1.591566
16  0.9014748   1.718669
17  0.7604917   1.782863
18  0.8566499   1.796216
19  1.4307273   1.913675
20  1.7579695   1.903155

So far I have this:

data = as.data.frame(data)
x = data$Var1
y = data$Var2
dat = data.frame(x,y)

library(boot)
set.seed(1)
bootCorTest3 <- function(data, i){
  d <- data[i, ]
  results  <- cor.test(d$x, d$y, method='pearson')
  c(est = results$estimate, stat = results$statistic, param = results$parameter, p.value = results$p.value, CI = results$conf.int)
}


b3 <- boot(dat, bootCorTest3, R = 1000)
b3

# Original (non-bootstrap) statistics with label
b3$t0
colMeans(b3$t)
boot.ci(b3, type = c("norm", "basic", "perc", "bca")) #bootstrapped CI.

The bootstrapped p value should be the one I get with colMeans(b3$t), right?

colMeans(b3$t) gives me this:

est.cor      stat.t    param.df     p.value         CI1         CI2
 0.28495324  2.13981008 48.00000000  0.14418623  0.01438146  0.51726022

It seems like everything is working fine. The problem is that I ran the same statistics on a different software and the results are widely different. The p-value I get here is way higher than on the other. I think that I may have done something wrong here as I am not strong in R.

Can anyone give me some feedback on this code? Am I doing something wrong? Ho would you get the bootstrapped p.value for the Pearson Correlation?

Thank you for your time.

标签： r correlation statistics-bootstrap

1条回答

该账号已被封号

2楼-- · 2020-05-03 13:33

If you want to bootstrap your correlation test, you only need to return the correlation coefficient from your bootstrap statistic function. Bootstrapping the p-value of the correlation test is not appropriate in this case, as you ignore the directionality of the correlation test.

Check this question on CrossValidated for some nice answers on performing bootstrap hypothesis tests: https://stats.stackexchange.com/questions/20701/computing-p-value-using-bootstrap-with-r

library("boot")
data <- read.csv("~/Documents/stack/tmp.csv", header = FALSE)
colnames(data) <- c("x", "y")

data <- as.data.frame(data)
x <- data$Var1
y <- data$Var2
dat <- data.frame(x,y)

set.seed(1)

b3 <- boot(data, 
  statistic = function(data, i) {
    cor(data[i, "x"], data[i, "y"], method='pearson')
  },
  R = 1000
)
b3
#> 
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#> 
#> 
#> Call:
#> boot(data = data, statistic = function(data, i) {
#>     cor(data[i, "x"], data[i, "y"], method = "pearson")
#> }, R = 1000)
#> 
#> 
#> Bootstrap Statistics :
#>      original        bias    std. error
#> t1* 0.1279691 -0.0004316781    0.314056
boot.ci(b3, type = c("norm", "basic", "perc", "bca")) #bootstrapped CI. 
#> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
#> Based on 1000 bootstrap replicates
#> 
#> CALL : 
#> boot.ci(boot.out = b3, type = c("norm", "basic", "perc", "bca"))
#> 
#> Intervals : 
#> Level      Normal              Basic         
#> 95%   (-0.4871,  0.7439 )   (-0.4216,  0.7784 )  
#> 
#> Level     Percentile            BCa          
#> 95%   (-0.5225,  0.6775 )   (-0.5559,  0.6484 )  
#> Calculations and Intervals on Original Scale

plot(density(b3$t))
abline(v = 0, lty = "dashed", col = "grey60")

In this case without a p-value it's quite safe to say that most of the mass of the sampling distribution is very close to zero.

0人赞添加讨论(0) 举报

Bootstrapped correlation in R

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间