I have a data frame with eight variables. I would like to calculate the average mean of annual weighted average percent loss. However, not all variables exist for each year in my dataset. What would be the simplest method to do so? Included below is a sample dataset and final desired output.
Thank you!
sample dataset
Fruit.Type Year Primary.Wgt Primary.Loss.PCT Retail.Wgt Retail.Loss.PCT Cons.Wgt Cons.Loss.PCT
Oranges.F 1970 16.16 3.0 15.68 11.6 13.86 36.0
Oranges.F 1971 15.73 3.0 15.26 11.6 13.49 36.0
Oranges.F 1972 14.47 3.0 14.04 11.6 12.41 36.0
Oranges.F 1973 14.43 3.0 14.00 11.6 12.38 36.0
Tangerines.F 1971 2.34 5.0 2.22 20.4 1.80 52.0
Tangerines.F 1972 2.06 5.0 1.96 20.4 1.60 52.0
Tangerines.F 1973 2.07 5.0 1.97 20.4 1.60 52.0
Grapefruit.F 1970 8.22 3.0 7.97 12.8 6.90 20.0
Grapefruit.F 1971 8.55 3.0 8.29 12.8 7.20 20.0
Grapefruit.F 1972 8.56 3.0 8.31 12.8 7.20 20.0
Grapefruit.F 1973 8.57 3.0 8.31 12.8 7.20 20.0
desired output (calc'd in excel)
Output (weighted average percent loss)
Year Primary.Loss.PCT Retail.Loss.PCT Cons.Loss.PCT
1970 3.00 11.82 11.98
1971 3.00 14.95 32.16
1972 3.16 14.66 31.78
1973 3.17 14.68 31.77
Mean 3.08 14.03 26.92
Standard Error 0.048 0.737 4.980
There are many ways. I would prefer via a data.table
.
First convert your data into a data.table
:
require(data.table) #tested in data.table 1.9.4
setDT(mydata)
> mydata
Fruit.Type Year Primary.Wgt Primary.Loss.PCT Retail.Wgt Retail.Loss.PCT
1: Oranges.F 1970 16.16 3 15.68 11.6
2: Oranges.F 1971 15.73 3 15.26 11.6
3: Oranges.F 1972 14.47 3 14.04 11.6
4: Oranges.F 1973 14.43 3 14.00 11.6
5: Tangerines.F 1971 2.34 5 2.22 20.4
6: Tangerines.F 1972 2.06 5 1.96 20.4
7: Tangerines.F 1973 2.07 5 1.97 20.4
8: Grapefruit.F 1970 8.22 3 7.97 12.8
9: Grapefruit.F 1971 8.55 3 8.29 12.8
10: Grapefruit.F 1972 8.56 3 8.31 12.8
11: Grapefruit.F 1973 8.57 3 8.31 12.8
Cons.Wgt Cons.Loss.PCT
1: 13.86 36
2: 13.49 36
3: 12.41 36
4: 12.38 36
5: 1.80 52
6: 1.60 52
7: 1.60 52
8: 6.90 20
9: 7.20 20
10: 7.20 20
11: 7.20 20
Then let's do the group-based aggregation:
mydata2 <- mydata[,list(
Primary.Loss.PCT=sum(Primary.Wgt*Primary.Loss.PCT)/sum(Primary.Wgt),
Retail.Loss.PCT=sum(Retail.Wgt*Retail.Loss.PCT)/sum(Retail.Wgt),
Cons.Loss.PCT=sum(Cons.Wgt*Cons.Loss.PCT)/sum(Cons.Wgt)),
by=Year]
> mydata2
Year Primary.Loss.PCT Retail.Loss.PCT Cons.Loss.PCT
1: 1970 3.000000 12.00440 30.68208
2: 1971 3.175808 12.74412 32.15829
3: 1972 3.164209 12.71970 31.77558
4: 1973 3.165138 12.72471 31.76959
Finally, we compute the mean and se:
> colMeans(mydata2[,-1,with=FALSE])
Primary.Loss.PCT Retail.Loss.PCT Cons.Loss.PCT
3.126289 12.548234 31.596386
> require(plotrix); std.error(mydata2[,-1,with=FALSE])
Primary.Loss.PCT Retail.Loss.PCT Cons.Loss.PCT
0.04217833 0.18135513 0.31804132
I hope I had understand the logic of your computation. However, the final output is different from yours. Anyway, you may adjust the code to follow your needs.