Randomly Assign Integers in R within groups withou

2019-09-08 03:28发布

问题:

I am running an experiment with two experiments: experiment_1 and experiment_2. Each experiment has 5 different treatments (i.e. 1, 2, 3, 4, 5). We are trying to randomly assign the treatments within groups.

We would like to do this via sampling without replacement iteratively within each group. We want to do this to insure that we get as a balanced a sample as possible in the treatment (e.g. we don't want to end up with 4 subjects in group 1 getting assigned to treatment 2 and no one getting treatment 1). So if a group has 23 subjects, we want to split the respondent into 4 subgroups of 5, and 1 subgroup of 3. We then want to randomly sample without replacement across the first subgroup of 5, so everyone gets assigned 1 of the treatments, do the same things for the the second, third and 4th subgroup of 5, and for the final subgroup of 3 randomly sample without replacement. So we would guarantee that every treatment is assigned to at least 4 subjects, and 3 are assigned to 5 subjects within this group. We would like to do this for all the groups in the experiment and for both treatments. The resultant output would look something like this...

         group   experiment_1   experiment_2
    [1,]     1           5             3
    [2,]     1           3             2
    [3,]     1           4             4
    [4,]     1           1             5
    [5,]     1           2             1
    [6,]     1           2             3
    [7,]     1           4             1
    [8,]     1           3             2
    [9,]     2           5             5
   [10,]     2           1             4
   [11,]     2           3             4
   [12,]     2           1             5
   [13,]     2           2             1
      .      .           .             .
      .      .           .             .
      .      .           .             .

I know how to use the sample function, but am unsure how to sample without replacement within each group, so that our output corresponds to above described procedure. Any help would be appreciated.

回答1:

I think we just need to shuffle sample IDs, see this example:

set.seed(124)
#prepare groups and samples(shuffled)
df <- data.frame(group=sort(rep(1:3,9)),
                  sampleID=sample(1:27,27))

#treatments repeated nrow of df
df$ex1 <- rep(c(1,2,3,4,5),ceiling(nrow(df)/5))[1:nrow(df)]
df$ex2 <- rep(c(2,3,4,5,1),ceiling(nrow(df)/5))[1:nrow(df)]

df <- df[ order(df$group,df$sampleID),]

#check treatment distribution
with(df,table(group,ex1))
#       ex1
# group 1 2 3 4 5
#     1 2 2 2 2 1
#     2 2 2 2 1 2
#     3 2 2 1 2 2
with(df,table(group,ex2))
#       ex2
# group 1 2 3 4 5
#     1 1 2 2 2 2
#     2 2 2 2 2 1
#     3 2 2 2 1 2


回答2:

How about this function:

f <- function(n,m) {sample( c( rep(1:m,n%/%m), sample(1:m,n%%m) ), n )}

"n" is the group size, "m" the number of treatments. Each treatment must be containt at least "n %/% m" times in the group. The treatment numbers of the remaining "n %% m" group members are assigned arbitrarily without repetition. The vector "c( rep(1:m,n%/%m), sample(1:m,n%%m) )" contains these treatment numbers. Finally the "sample" function perturbes these numbers.

> f(8,5)
[1] 5 3 1 5 4 2 2 1
> f(8,5)
[1] 4 5 3 4 2 2 1 1
> f(8,5)
[1] 4 2 1 5 3 5 2 3

Here is a function that creates a dataframe, using the above function:

Plan <- function( groupSizes, numExp=2, numTreatment=5 )
{
  numGroups <- length(groupSizes)
  df <- data.frame( group = rep(1:numGroups,groupSizes) )

  for ( e in 1:numExp )
  {
    df <- cbind(df,unlist(lapply(groupSizes,function(n){f(n,numTreatment)})))
    colnames(df)[e+1] <- sprintf("Exp_%i", e)
  }
  return(df)
}

Example:

> P <- Plan(c(8,23,13,19))
> P
   group Exp_1 Exp_2
1      1     4     1
2      1     1     4
3      1     2     2
4      1     2     1
5      1     3     5
6      1     5     5
7      1     1     2
8      1     3     3
9      2     5     1
10     2     2     1
11     2     5     2
12     2     1     2
13     2     2     1
14     2     1     4
15     2     3     5
16     2     5     3
17     2     2     4
18     2     5     4
19     2     2     5
20     2     1     1
21     2     4     2
22     2     3     3
23     2     4     3
24     2     2     5
25     2     3     3
26     2     5     2
27     2     1     5
28     2     3     4
29     2     4     4
30     2     4     2
31     2     4     3
32     3     2     5
33     3     5     3
34     3     5     1
35     3     5     1
36     3     2     5
37     3     4     4
38     3     1     4
39     3     3     2
40     3     3     2
41     3     3     3
42     3     1     1
43     3     4     2
44     3     4     4
45     4     5     1
46     4     3     1
47     4     1     2
48     4     1     5
49     4     3     3
50     4     3     1
51     4     4     5
52     4     2     4
53     4     5     3
54     4     2     1
55     4     4     2
56     4     2     5
57     4     4     4
58     4     5     3
59     4     5     4
60     4     1     2
61     4     2     5
62     4     3     2
63     4     4     4

Check the distribution:

> with(P,table(group,Exp_1))
     Exp_1
group 1 2 3 4 5
    1 2 2 2 1 1
    2 4 5 4 5 5
    3 2 2 3 3 3
    4 3 4 4 4 4
> with(P,table(group,Exp_2))
     Exp_2
group 1 2 3 4 5
    1 2 2 1 1 2
    2 4 5 5 5 4
    3 3 3 2 3 2
    4 4 4 3 4 4
> 


回答3:

The design of efficient experiments is a science on its own and there are a few R-packages dealing with this issue:

https://cran.r-project.org/web/views/ExperimentalDesign.html

I am afraid your approach is not optimal regarding the resources, no matter how you create the samples...

However this might help:

n <- 23
group <- sort(rep(1:5, ceiling(n/5)))[1:n]  
exp1 <- rep(NA, length(group))
for(i in 1:max(group)) {
    exp1[which(group == i)] <- sample(1:5)[1:sum(group == i)]
}


回答4:

Not exactly sure if this meets all your constraints, but you could use the randomizr package:

library(randomizr)
experiment_1 <- complete_ra(N = 23, num_arms = 5)
experiment_2 <- block_ra(experiment_1, num_arms = 5)
table(experiment_1)
table(experiment_2)
table(experiment_1, experiment_2)

Produces output like this:

> table(experiment_1)
experiment_1
T1 T2 T3 T4 T5 
 4  5  5  4  5 
> table(experiment_2)
experiment_2
T1 T2 T3 T4 T5 
 6  3  6  4  4 
> table(experiment_1, experiment_2)
            experiment_2
experiment_1 T1 T2 T3 T4 T5
          T1  2  0  1  1  0
          T2  1  1  1  1  1
          T3  1  1  1  1  1
          T4  1  0  2  0  1
          T5  1  1  1  1  1