I have a data frame in the format:
head(subset)
# ants 0 1 1 0 1
# age 1 2 2 1 3
# lc 1 1 0 1 0
I need to create new data frame with random samples according to age and lc. For example I want 30 samples from age:1 and lc:1, 30 samples from age:1 and lc:0 etc.
I did look at random sampling method like;
newdata <- function(subset, age, 30)
But it is not the code that I want.
Here's some data:
You want a split-apply-combine strategy, where you
split
your data.frame (d
in this example), sample rows/observations from each subsample, and then combine then back together withrbind
. Here's how it works:The result:
Unless I've misunderstood the question, this is ridiculously easy to do with simple functions.
Step 1: Create a stratum indicator using the
interaction
function.Step 2: Use
tapply
on a sequence of row indicators to identify the indices of the random sample.Step 3: Subset the data with those indices
Using the data example from @Thomas:
Verify appropriate stratification
I would suggest using either
stratified
from my "splitstackshape" package, orsample_n
from the "dplyr" package:For
stratified
, you basically specify the dataset, the stratifying columns, and an integer representing the size you want from each group OR a decimal representing the fraction you want returned (for example, .1 represents 10% from each group).For
sample_n
you first create a grouped table (usinggroup_by
) and then specify the number of observations you want. If you wanted proportional sampling instead, you should usesample_frac
.See the function
strata
from the package sampling. The function selects stratified simple random sampling and gives a sample as a result. Extra two columns are added - inclusion probabilities (Prob
) and strata indicator (Stratum
). See the example.