I am running an experiment with two experiments: experiment_1 and experiment_2. Each experiment has 5 different treatments (i.e. 1, 2, 3, 4, 5). We are trying to randomly assign the treatments within groups.
We would like to do this via sampling without replacement iteratively within each group. We want to do this to insure that we get as a balanced a sample as possible in the treatment (e.g. we don't want to end up with 4 subjects in group 1 getting assigned to treatment 2 and no one getting treatment 1). So if a group has 23 subjects, we want to split the respondent into 4 subgroups of 5, and 1 subgroup of 3. We then want to randomly sample without replacement across the first subgroup of 5, so everyone gets assigned 1 of the treatments, do the same things for the the second, third and 4th subgroup of 5, and for the final subgroup of 3 randomly sample without replacement. So we would guarantee that every treatment is assigned to at least 4 subjects, and 3 are assigned to 5 subjects within this group. We would like to do this for all the groups in the experiment and for both treatments. The resultant output would look something like this...
group experiment_1 experiment_2
[1,] 1 5 3
[2,] 1 3 2
[3,] 1 4 4
[4,] 1 1 5
[5,] 1 2 1
[6,] 1 2 3
[7,] 1 4 1
[8,] 1 3 2
[9,] 2 5 5
[10,] 2 1 4
[11,] 2 3 4
[12,] 2 1 5
[13,] 2 2 1
. . . .
. . . .
. . . .
I know how to use the sample
function, but am unsure how to sample without replacement within each group, so that our output corresponds to above described procedure. Any help would be appreciated.
I think we just need to shuffle sample IDs, see this example:
Not exactly sure if this meets all your constraints, but you could use the
randomizr
package:Produces output like this:
The design of efficient experiments is a science on its own and there are a few R-packages dealing with this issue:
https://cran.r-project.org/web/views/ExperimentalDesign.html
I am afraid your approach is not optimal regarding the resources, no matter how you create the samples...
However this might help:
How about this function:
"n" is the group size, "m" the number of treatments. Each treatment must be containt at least "n %/% m" times in the group. The treatment numbers of the remaining "n %% m" group members are assigned arbitrarily without repetition. The vector "c( rep(1:m,n%/%m), sample(1:m,n%%m) )" contains these treatment numbers. Finally the "sample" function perturbes these numbers.
Here is a function that creates a dataframe, using the above function:
Example:
Check the distribution: