I've got a dataframe containing rates for 'live' treatments and rates for 'killed' treatments. I'd like to subtract the killed treatments from the live ones:
df <- data.frame(id1=gl(2, 3, labels=c("a", "b")),
id2=rep(gl(3, 1, labels=c("live1", "live2", "killed")), 2),
y=c(10, 10, 1, 12, 12, 2),
otherFactor = gl(3, 2))
I'd like to subtract the values of y
for which id2=="killed"
from all the other values of y
, separated by the levels of id1, while preserving otherFactor
. I would end up with
id1 id2 y otherFactor
a live1 9 1
a live2 9 1
b live1 10 2
b live2 10 3
This almost works:
df_minusKill <- ddply(df, .(id1), function(x) x$y[x$id2!="killed"] - x$y[x$id2=="killed"])
names(df_minusKill) <- c("id1", "live1", "live2")
df_minusKill_melt <- melt(df_minusKill, measure.var=c("live1", "live2"))
except that you lose the values of otherFactor. Maybe I could use merge
to put the values of otherFactor
back in, but in reality I have about a dozen "otherFactor" columns, so it would be less cumbersome to just keep them in there automatically.
The
by
function can process sections of dataframe separately by factors (or you could uselapply(split(df , ...))
:You could assign this to a column in df and subset out the rows with
id2
not equal to 'killing'.