Randomly select groups (and all cases per group) i

2019-05-24 03:13发布

问题:

I have an R dataframe with two levels of data: id and year. Within groups defined by id, the years increase (entire dataset has the same (number of) years per group, like so:

id    year    var1    var2
11A   2001    ...     ...
11A   2002    ...     ...
11A   2003    ...     ...
11A   2004    ...     ...
13B   2001    ...     ...
13B   2002    ...     ...
13B   2003    ...     ...
13B   2004    ...     ...
22Z   2001    ...     ...

I have about 20.000 groups in my data, of couse way too many to make nice plots of growth curves. How do I randomly select about 20 of my id's? (so: also select all 4 rows of years corresponding to that id?)

回答1:

This is pretty straight forward if you use sample and then index. Here's a made up example that looks similar to what you've presented. It's really only two lines of code and could be done in one if you wanted.

dat <- data.frame(id=paste0(LETTERS[1:8], rep(1:1250, 8)), 
   year=as.factor(as.character(sample(c(1990:2012, 20000, T)))), 
   var1=rnorm(20000), var2=rnorm(20000))

#a look at the data
head(dat)

#sample 20 id's randomly
(ids <- sample(unique(dat$id), 20))

#narrow your data set
dat2 <- dat[dat$id %in% ids, ]


回答2:

subset(df, id %in% sample(levels(df$id), 20))

that's assuming your data frame is called df and that your id is a factor (use unique instead of levels if it's not)



标签: r sample