Test (90%) and Control (10%) split at retailer lev

2019-08-26 08:06发布

问题:

Business Context: To put it simply, I would like to split my data set in to test and control and run some campaigns in the future. But there seems to be lot of complexities involved when I try to split it.

Data:

RetailerCode    CID        Segment
A6005         13SVC15       High
A6005         19VDE1F       Low
A6005         1B3BD1F       Medium
A6005         1B3HB48       Medium
A6005         1B3HB49       Low
A9006         1B3HB40       High
A9006         1B3HB41       High
A9006         1B3HB43       Low
A9006         1B3HB46       Medium

I have a master table like this which has the list of customers and their retailers, segment etc. (it has more than 30 columns, but I don’t want to show everything here). I would like to split this data set by adding a column, lets say “test_control” which will tell us whether particular row is test or control category. I can split this randomly but I need to follow the following rules,

-   If a customer is tied to two or more retailers, then he should be in control group
-   Each retailer will be provided with list of customers to target for the campaigns and the retailer will run the campaign. Here, 
o   Test-Control split should be done at Retailer level and then at segment level. For example,  For each retailer
   10% of their High customers to control and remaining 90% of their high customers to test.
   10% of their Medium customers to control and remaining 90% of their Medium customers to test
   10% of their Low customers to control and remaining 90% of their Low customers to test.

I can share the query that I wrote but that will totally confuse you guys