Non-parametric bootstrapping on the highest level

I have two-level hierarchical data and I'm trying to perform non-parametric bootstrap sampling on the highest level, i.e., randomly sampling the highest-level clusters with replacement while keeping the original within-cluster data.

I want to achieve this using the boot() function in the {boot} package, for the reason that I then would like to build BCa confidence intervals using boot.ci() which requires a boot object.

Here follows my unlucky attempt - running a debug on the boot call showed that random sampling is not happening at cluster level (=subject).

### create a very simple two-level dataset with 'subject' as clustering variable

rho <- 0.4
dat <- expand.grid(
    trial=factor(1:5),
    subject=factor(1:3)
    )
sig <- rho * tcrossprod(model.matrix(~ 0 + subject, dat))
diag(sig) <- 1
set.seed(17); dat$value <- chol(sig) %*% rnorm(15, 0, 1)


### my statistic function (adapted from here: http://biostat.mc.vanderbilt.edu/wiki/Main/HowToBootstrapCorrelatedData)

resamp.mean <- function(data, i){
    cluster <- c('subject', 'trial')

    # sample the clustering factor
    cls <- unique(data[[cluster[1]]])[i]   

    # subset on the sampled clustering factors
    sub <- lapply(cls, function(b) subset(data, data[[cluster[1]]]==b))   

    sub.2 <- do.call(rbind, sub)      # join and return samples
    mean((sub.2$value))               # calculate the statistic
}


debugonce(boot)
set.seed(17); dat.boot <- boot(data = dat, statistic = resamp.mean, 4)


### stepping trough the debugger until object 'i' was assigned
### investigating 'i'
# Browse[2]> head(i)

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,]    3    7   12   13   10   14   14   15   12    12    12     4     5     9    10
[2,]   15    9    3   13    4   10    2    4    6    11    10     4     9     4     3
[3,]    8    4    7   15   10   12    9    8    9    12     4    15    14    10     4
[4,]   12    3    1   15    8   13    9    1    4    13     9    13     2    11     2

### which is not what I was hoping for.


### I would like something that looks like this, supposing indices = c(2, 2, 1) for the first resample: 

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,]    6    7    8    9   10    6    7    8    9    10     1     2     3     4     5

Any help would be very much appreciated.

标签： r statistics hierarchical-data confidence-interval statistics-bootstrap

1条回答

聊天终结者

2楼-- · 2019-07-20 21:26

I think the problem originates from the modified statistic function (specifically, the cls object within the function). Can you try this one? Uncomment the print statement to see which subjects have been sampled. It does not use the index argument which boot expects, instead it just uses sample as in the original function.

resamp.mean <- function(dat, 
                        indices, 
                        cluster = c('subject', 'trial'), 
                        replace = TRUE){
      # boot expects an indices argument but the sampling happens
      # via sample() as in the original source of the function

      # sample the clustering factor
      cls <- sample(unique(dat[[cluster[1]]]), replace=replace)

      # subset on the sampled clustering factors
      sub <- lapply(cls, function(b) subset(dat, dat[[cluster[1]]]==b))

      # join and return samples
      sub <- do.call(rbind, sub)

      # UNCOMMENT HERE TO SEE SAMPLED SUBJECTS 
      # print(sub)

      mean(sub$value)
}

One resample from the resamp.mean function before the mean of value is calculated looks like this:

    trial subject      value
1       1       1 -1.1581291
2       2       1 -0.1458287
3       3       1 -0.2134525
4       4       1 -0.5796521
5       5       1  0.6501587
11      1       3  2.6678441
12      2       3  1.3945740
13      3       3  1.4849435
14      4       3  0.4086737
15      5       3  1.3399146
111     1       1 -1.1581291
121     2       1 -0.1458287
131     3       1 -0.2134525
141     4       1 -0.5796521
151     5       1  0.6501587

0人赞添加讨论(0) 举报

Non-parametric bootstrapping on the highest level

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间