Random Sample with multiple probabilities in R [du

2020-06-28 12:48发布

问题:

I need to get out a sample of subjects from a list to assign them as a Control Group for a study which has to have a similar composition of variables. I am trying to do this in R with the sample function but I don´t know how to specify the differetnt probabilities for each variable. Lets say I have a table with the following headers:

ID Name Campaign Gender

I need a sample of 10 subjects with the following composition of Campaign attributes:

D2D --> 25%

F2F --> 38%

TM --> 17%

WW --> 21%

This means from my data set I have 25% of subjects coming from a Door to Door Campaign (D2D), 38% from a Face to Face Campaign (F2F), etc

And the gender composition is as following:

Male --> 54%

Female --> 46%

When I get a random sample of 10 subjects I need it to have a similar composition.

I have been searching for hours and the closest I was able to get to anything similar was this answer: taking data sample in R but I need to assign more than one probability.

I am sure that this could help anyone who wants to get a representative sample from a Data Set.

回答1:

It sounds like you are interested in taking a random stratified sample. You could do this using the stratsample() function from the survey package.

In the example below, I create some fake data to mimic what you have, then I define a function to take a random proportional stratified random sample, then I apply the function to the fake data.

# example data
ndf <- 1000
df <- data.frame(ID=sample(ndf), Name=sample(ndf), 
    Campaign=sample(c("D2D", "F2F", "TM", "WW"), ndf, prob=c(0.25, 0.38, 0.17, 0.21), replace=TRUE),
    Gender=sample(c("Male", "Female"), ndf, prob=c(0.54, 0.46), replace=TRUE))

# function to take a random proportional stratified sample of size n
rpss <- function(stratum, n) {
    props <- table(stratum)/length(stratum)
    nstrat <- as.vector(round(n*props))
    nstrat[nstrat==0] <- 1
    names(nstrat) <- names(props)
    stratsample(stratum, nstrat)
    }

# take a random proportional stratified sample of size 10
selrows <- rpss(stratum=interaction(df$Campaign, df$Gender, drop=TRUE), n=10)
df[selrows, ]