Group androgynous names and sum amount for each ye

2019-09-09 17:26发布

问题:

I have a data frame with 4 columns titled 'year' 'name' 'sex' 'amount'. Here is a sample data set

set.seed(1)
    data = data.frame(year=sample(1950:2000, 50, replace=TRUE),name=sample(LETTERS, 50, replace=TRUE),
                       sex=sample(c("M", "F"), 50, replace=TRUE), amount=sample(40:100, 50, replace=TRUE))

I want to find only names that occur as both an ‘m’ and an ‘f’ and sum the amount for each year.

Any help would be greatly appreciated

回答1:

I changed the data a bit, so that there would be common names in some years. On visual inspection this seems to work. The syntax may appear a bit cryptic if you are not familiar with data.table but it's concise and should be quite efficient:

require(data.table)
set.seed(1)
data = data.frame(year=sample(1950:1960, 100, replace=TRUE),name=sample(LETTERS, 100, replace=TRUE),
                  sex=sample(c("M", "F"), 100, replace=TRUE), amount=sample(40:100, 100,  replace=TRUE))
setDT(data) ## change to data.table format
data[, common=sum(amount[name %in% intersect(name[sex=="M"], name[sex=="F"])]), by=year]