Piggy backing off my own previous question python pandas: assign control vs. treatment groupings randomly based on %
Thanks to @maxU, I know how to assign random control/treatment groupings to 2 groups; but what if I have 3 groups or more?
For example:
df.head()
customer_id | Group | many other columns
ABC 1
CDE 3
BHF 2
NID 1
WKL 3
SDI 2
JSK 1
OSM 3
MPA 2
MAD 1
pd.pivot_table(df,index=['Group'],values=["customer_id"],aggfunc=lambda x: len(x.unique()))
Group 1 : 270
Group 2 : 180
Group 3 : 330
I have a great answer, when I only have two groups:
df['Flag'] = df.groupby('Group')['customer_id']\
.transform(lambda x: np.random.choice(['Control','Test'], len(x),
p=[.5,.5] if x.name==1 else [.4,.6]))
But what if i want to split it this way:
- Group 1: 50% Control & 50% Test
- Group 2: 40% Control & 60% Test
- Group 3: 20% Control & 80% Test
@MaxU's answer is great, but unfortunately the split is not exact
d = {1:[.5,.5], 2:[.4,.6], 3:[.2,.8]}
df['Flag'] = df.groupby('Group')['customer_id'] \
.transform(lambda x: np.random.choice(['Control','Test'], len(x), p=d[x.name]))
When i test it, I don't get exact splits.
pd.pivot_table(df,index=['Group'],values=["customer_id"],columns=['Flag'], aggfunc=lambda x: len(x.unique()))
Control Treatment
Group 1: 138 132
Group 2: 78 102
Group 3: 79 251
Group 1 should be 135/135.
It sounds like you're looking for a way to split your
customer_id
's into exact proportions, and not rely on chance. Here's one way to do that usingpandas.qcut
andnp.random.permutation
.What's going on here?
assigner
assigner
grabs the group name and proportions from the predefined dictionary and callspd.qcut
to split into 0(control) 1(treatment)np.random.permutation
then shuffles the the assignments