I am working on an experiment design, where I need to split a dataframe df into a control and treatment group by % by pre-existing groupings.
This is the dataframe df:
df.head()
customer_id | Group | many other columns
ABC 1
CDE 1
BHF 2
NID 1
WKL 2
SDI 2
pd.pivot_table(df,index=['Group'],values=["customer_id"],aggfunc=lambda x: len(x.unique()))
Group 1 : 55394
Group 2 : 34889
Now I need to add a column labeled "Flag" into the df. For Group 1, I want to randomly assign 50% "Control" and 50% "Test". For Group 2, I want to randomly assign 40% "Control" and 60% "Test".
The output I am looking for:
customer_id | Group | many other columns | Flag
ABC 1 Test
CDE 1 Control
BHF 2 Test
NID 1 Test
WKL 2 Control
SDI 2 Test
we can use numpy.random.choice() method:
UPDATE: