I have data that looks like this:
sample start end gene coverage
X 1 10 A 5
X 11 20 A 10
Y 1 10 A 5
Y 11 20 A 10
X 1 10 B 5
X 11 20 B 10
Y 1 10 B 5
Y 11 20 B 10
I added additional columns:
data$length <- (data$end - data$start + 1)
data$ct_lt <- (data$length * data$coverage)
I reformated my data using dcast:
casted <- dcast(data, gene ~ sample, value.var = "coverage", fun.aggregate = mean)
So my new data looks like this:
gene X Y
A 10.00000 10.00000
B 38.33333 38.33333
This is the correct data format I desire, but I would like to fun.aggregate differently. Instead, I would like to take a weighted average, with coverage weighted by length:
( sum (ct_lt) ) / ( sum ( length ) )
How do I go about doing this?
Disclosure: no R in front of me, but I think your friend here may be the dplyr and tidyr packages.
Certainly lots of ways to accomplish this, but I think the following might get you started
Hope this helps...