I have a dataframe with many descriptor variables (trt, individual, session). I want to be able to randomly select a fraction of the possible trt x individual
combinations but control for the session variable such that no random pull has the same session number. Here is what my dataframe looks like:
trt <- c(rep(c(rep("A", 3), rep("B", 3), rep("C", 3)), 9))
individual <- rep(c("Bob", "Nancy", "Tim"), 27)
session <- rep(1:27, each = 3)
data <- rnorm(81, mean = 4, sd = 1)
df <- data.frame(trt, individual, session, data))
df
trt individual session data
1 A Bob 1 3.72013685581385
2 A Nancy 1 3.97225419000673
3 A Tim 1 4.44714175686225
4 B Bob 2 5.00024599458127
5 B Nancy 2 3.43615965145765
6 B Tim 2 6.7920094635501
7 C Bob 3 4.36315054477571
8 C Nancy 3 5.07117348146375
9 C Tim 3 4.38503325758969
10 A Bob 4 4.30677162933005
11 A Nancy 4 1.89311687510669
12 A Tim 4 3.09084920968413
13 B Bob 5 3.10436190897144
14 B Nancy 5 3.59454992439722
15 B Tim 5 3.40778069131207
16 C Bob 6 4.00171937800892
17 C Nancy 6 0.14578811080644
18 C Tim 6 4.20754733296227
19 A Bob 7 3.69131009783284
20 A Nancy 7 4.7025756891679
21 A Tim 7 4.46196017363017
22 B Bob 8 3.97573281432736
23 B Nancy 8 4.5373185942686
24 B Tim 8 2.40937847038141
25 C Bob 9 4.57519884980087
26 C Nancy 9 5.19143914630448
27 C Tim 9 4.83144732833874
28 A Bob 10 3.01769965527235
29 A Nancy 10 5.17300616827746
30 A Tim 10 4.65432284571663
31 B Bob 11 4.50892032922527
32 B Nancy 11 3.38082717995663
33 B Tim 11 4.92022245677209
34 C Bob 12 4.54149796547394
35 C Nancy 12 3.21992774137179
36 C Tim 12 3.74507360931023
37 A Bob 13 3.39524949548056
38 A Nancy 13 4.17518916890901
39 A Tim 13 3.02932375225388
40 B Bob 14 3.59660910672907
41 B Nancy 14 2.08784850191654
42 B Tim 14 3.98446125755258
43 C Bob 15 4.01837496797085
44 C Nancy 15 3.40610126858125
45 C Tim 15 4.57107635588582
46 A Bob 16 3.15839276840723
47 A Nancy 16 2.19932140340504
48 A Tim 16 4.77588798035668
49 B Bob 17 4.3524768657397
50 B Nancy 17 4.49071625925856
51 B Tim 17 4.02576463486266
52 C Bob 18 3.74783360762117
53 C Nancy 18 2.84123227236184
54 C Tim 18 3.2024114782253
55 A Bob 19 4.93837445490921
56 A Nancy 19 4.7103051496802
57 A Tim 19 6.22083635045134
58 B Bob 20 4.5177747677824
59 B Nancy 20 1.78839270771153
60 B Tim 20 5.07140678136995
61 C Bob 21 3.47818616035335
62 C Nancy 21 4.28526474048439
63 C Tim 21 4.22597602946575
64 A Bob 22 1.91700925257901
65 A Nancy 22 2.96317997587458
66 A Tim 22 2.53506974227672
67 B Bob 23 5.52714403395316
68 B Nancy 23 3.3618513551059
69 B Tim 23 4.85869007113978
70 C Bob 24 3.4367068543959
71 C Nancy 24 4.47769879000349
72 C Tim 24 5.77340483757836
73 A Bob 25 4.78524317734622
74 A Nancy 25 3.55373702554664
75 A Tim 25 2.88541465503637
76 B Bob 26 4.62885302019139
77 B Nancy 26 3.59430293369092
78 B Tim 26 2.29610255924296
79 C Bob 27 4.38433001299722
80 C Nancy 27 3.77825207859976
81 C Tim 27 2.12163194694365
How do I pull out 2 of each trt x individual
combinations with a unique session number? This is an example what I want the dataframe to look like:
trt individual session data
1 A Bob 1 3.72013685581385
5 B Nancy 2 3.43615965145765
7 C Bob 3 4.36315054477571
12 A Tim 4 3.09084920968413
15 B Tim 5 3.40778069131207
17 C Nancy 6 0.14578811080644
19 A Bob 7 3.69131009783284
29 A Nancy 10 5.17300616827746
31 B Bob 11 4.50892032922527
34 C Bob 12 4.54149796547394
39 A Tim 13 3.02932375225388
40 B Bob 14 3.59660910672907
47 A Nancy 16 2.19932140340504
51 B Tim 17 4.02576463486266
54 C Tim 18 3.2024114782253
59 B Nancy 20 1.78839270771153
71 C Nancy 24 4.47769879000349
81 C Tim 27 2.12163194694365
I have tried a couple things with no luck.
I have tried to just randomly select two trt x individual
combinations, but I end up with duplicate session values:
setDT((df))
df[ , .SD[sample(.N, 2)] , keyby = .(trt, individual)]
trt individual session data
1: A Bob 25 2.7560788894668
2: A Bob 19 4.12040841647523
3: A Nancy 4 5.35362338127901
4: A Nancy 19 5.51636882737692
5: A Tim 19 5.10553640201998
6: A Tim 1 2.77380671625473
7: B Bob 23 3.50585105164409
8: B Bob 8 3.58167259470814
9: B Nancy 23 2.85301307507985
10: B Nancy 8 2.85179395539781
11: B Tim 26 2.40666507132474
12: B Tim 20 3.31276311351286
13: C Bob 24 3.19076007024549
14: C Bob 3 3.59146613276121
15: C Nancy 9 4.46606667880457
16: C Nancy 15 2.25405252536256
17: C Tim 12 4.43111661206133
18: C Tim 27 4.23868848646589
I have tried randomly selecting one of each session number and then pulling 2 trt x individual
combinations, but it typically comes back with an error since the random selection doesnt grab an equal number of trt x individual
combinations:
ind <- sapply( unique(df$session ) , function(x) sample( which(df$session == x) , 1) )
df.unique <- df[ind, ]
df.sub <- df.unique[, .SD[sample(.N, 2)] , by = .(trt, individual)]
Error in `[.data.frame`(df.unique, , .SD[sample(.N, 2)], by = .(trt, individual)) :
unused argument (by = .(trt, individual))
Thanks in advance for your help!