Randomly sample a percentage of rows within a data

Related to this question.

gender <- c("F", "M", "M", "F", "F", "M", "F", "F")
age    <- c(23, 25, 27, 29, 31, 33, 35, 37)
mydf <- data.frame(gender, age) 

mydf[ sample( which(mydf$gender=='F'), 3 ), ]

Instead of selecting a number of rows (3 in above case), how can I randomly select 20% of rows with "F"? So of the five rows with "F", how do I randomly sample 20% of those rows.

标签： r row subset random-sample

4条回答

smile是对你的礼貌

2楼-- · 2019-03-24 15:05

To sample 20%, you can use this to get the sample size:

n = round(0.2 * nrow(mydf[mydf$gender == "F",]))

0人赞添加讨论(0) 举报

爷的心禁止访问

3楼-- · 2019-03-24 15:09

How about this:

mydf[ sample( which(mydf$gender=='F'), round(0.2*length(which(mydf$gender=='F')))), ]

Where 0.2 is your 20% and length(which(mydf$gender=='F')) is the total number of rows with F

0人赞添加讨论(0) 举报

We Are One

4楼-- · 2019-03-24 15:19

Self-promotion alert. I wrote a function that allows convenient stratified sampling, and I've included an option to subset levels from the grouping variables before sampling.

The function is called stratified and can be used in the following ways:

set.seed(1)
# Proportional sample
stratified(mydf, group="gender", size=.2, select=list(gender = "F"))
#   gender age
# 4      F  29
# Fixed-size sampling
stratified(mydf, group="gender", size=2, select=list(gender = "F"))
#   gender age
# 4      F  29
# 5      F  31

You can specify multiple groups (for example if your data frame included a "state" variable and you wanted to group by "state" and "gender" you would specify group = c("state", "gender")). You can also specify multiple "select" arguments (for example, if you wanted only female respondents from California and Texas, and your "state" variable used two-letter state abbreviations, you could specify select = list(gender = "F", state = c("CA", "TX"))).

The function itself can be found here or you can download and install the package (which gives you convenient access to the help pages and examples) by using install_github from the "devtools" package as follows:

# install.packages("devtools")
library(devtools)
install_github("mrdwabmisc", "mrdwab")

0人赞添加讨论(0) 举报

Emotional °昔

5楼-- · 2019-03-24 15:20

You can use sample_frac() function in dplyr package.

e.g. If you want to sample 20 % within each group:

mydf %>% sample_frac(.2)

If you want to sample 20 % within each gender group:

mydf %>% group_by(gender) %>% sample_frac(.2)

0人赞添加讨论(0) 举报

Randomly sample a percentage of rows within a data

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间