I have a data frame with 4 columns titled 'year' 'name' 'sex' 'amount'. Here is a sample data set
set.seed(1)
data = data.frame(year=sample(1950:2000, 50, replace=TRUE),name=sample(LETTERS, 50, replace=TRUE),
sex=sample(c("M", "F"), 50, replace=TRUE), amount=sample(40:100, 50, replace=TRUE))
I want to find only names that occur as both an ‘m’ and an ‘f’ and sum the amount for each year.
Any help would be greatly appreciated
I changed the data a bit, so that there would be common names in some years. On visual inspection this seems to work. The syntax may appear a bit cryptic if you are not familiar with data.table
but it's concise and should be quite efficient:
require(data.table)
set.seed(1)
data = data.frame(year=sample(1950:1960, 100, replace=TRUE),name=sample(LETTERS, 100, replace=TRUE),
sex=sample(c("M", "F"), 100, replace=TRUE), amount=sample(40:100, 100, replace=TRUE))
setDT(data) ## change to data.table format
data[, common=sum(amount[name %in% intersect(name[sex=="M"], name[sex=="F"])]), by=year]