This question already has answers here:
Closed 6 years ago.
Is there a good way of getting a sample of rows from part of a dataframe?
If I just have data such as
gender <- c("F", "M", "M", "F", "F", "M", "F", "F")
age <- c(23, 25, 27, 29, 31, 33, 35, 37)
then I can easily sample the ages of three of the Fs with
sample(age[gender == "F"], 3)
and get something like
[1] 31 35 29
but if I turn this data into a dataframe
mydf <- data.frame(gender, age)
I cannot use the obvious
sample(mydf[mydf$gender == "F", ], 3)
though I can concoct something convoluted with an absurd number of brackets like
mydf[sample((1:nrow(mydf))[mydf$gender == "F"], 3), ]
and get what I want which is something like
gender age
7 F 35
4 F 29
1 F 23
Is there a better way that takes me less time to work out how to write?
Your convoluted way is pretty much how to do it - I think all the answers will be variations on that theme.
For example, I like to generate the mydf$gender=="F"
indices first:
idx <- which(mydf$gender=="F")
Then I sample from that:
mydf[ sample(idx,3), ]
So in one line (although, you reduce the absurd number of brackets and possibly make your code easier to understand by having multiple lines):
mydf[ sample( which(mydf$gender=='F'), 3 ), ]
While the "wheee I'm a hacker!" part of me prefers the one-liner, the sensible part of me says that even though the two-liner is two lines, it is much more understandable - it's just your choice.
You say I cannot use the obvious:
sample(mydf[mydf$gender == "F", ], 3)
but you could write your own function for doing it:
sample.df <- function(df, n) df[sample(nrow(df), n), , drop = FALSE]
then run it on your subset selection:
sample.df(mydf[mydf$gender == "F", ], 3)
# gender age
# 5 F 31
# 4 F 29
# 1 F 23
(Personally I find sample.df(subset(mydf, gender == "F"), 3)
easier to read.)
This is now simpler with the enhanced version of sample
in my package:
library(devtools); install_github('kimisc', 'krlmlr')
library(kimisc)
sample.rows(subset(mydf, gender == "F"), 3)
See also this related answer for more detail.