Business Context: To put it simply, I would like to split my data set in to test and control and run some campaigns in the future. But there seems to be lot of complexities involved when I try to split it.
Data:
RetailerCode CID Segment
A6005 13SVC15 High
A6005 19VDE1F Low
A6005 1B3BD1F Medium
A6005 1B3HB48 Medium
A6005 1B3HB49 Low
A9006 1B3HB40 High
A9006 1B3HB41 High
A9006 1B3HB43 Low
A9006 1B3HB46 Medium
I have a master table like this which has the list of customers and their retailers, segment etc. (it has more than 30 columns, but I don’t want to show everything here). I would like to split this data set by adding a column, lets say “test_control” which will tell us whether particular row is test or control category. I can split this randomly but I need to follow the following rules,
- If a customer is tied to two or more retailers, then he should be in control group
- Each retailer will be provided with list of customers to target for the campaigns and the retailer will run the campaign. Here,
o Test-Control split should be done at Retailer level and then at segment level. For example, For each retailer
10% of their High customers to control and remaining 90% of their high customers to test.
10% of their Medium customers to control and remaining 90% of their Medium customers to test
10% of their Low customers to control and remaining 90% of their Low customers to test.
I can share the query that I wrote but that will totally confuse you guys