I want to bootstrap a data set that has groups in it. A simple scenario would be bootstrapping simple means:
data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5))
stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2)), by = "group"]}
boot(data, stat, R = 10)
This gives me the error incorrect number of subscripts on matrix
, because of by = "group"
part. I managed to solve it using subsetting, but don't like this solution. Is there simpler way to make this kind of task work?
In particular, I'd like to introduce an additional argument in the statistics function like stat(x, i, groupvar)
and pass it to the boot function like boot(data, stat(groupvar = group), R = 100)
?
This should do it:
Using
I received an error using the OP's code with the answer supplied by @eddi:
Produces the error message:
The error is fixed by removing
by=group
from the functionstat
:Which produces the following Bootstrap Statistics results:
Below, I modify the sample dataset to highlight which Bootstrap Statistic goes with which group-column combination:
Consider group 1 which has a mean value of 10 for x1 and a mean value of 10000 for x2 and group 2 which has a mean value of 2000 for x1 and a mean value of 8000 for x2:
Which gives:
Lots of problems in your code before you even get to the by group part.
Did you mean something like this?
Then from there you can worry about doing it by group however you choose to.
For instance:
For bigger datasets, try
data.table
:I went with
by()
rather than thedata.table
's,by=
argument because you want the output to be a list. There may be some functionality I don't know about for doing that, but I couldn't find it (see the edit history for the problem it was causing).The subsetting is still done via the
data.table
's[]
method, so it should be plenty fast.