I'm trying to get a summary data frame of the total quantities of the variables prop.damage
and crop.damage
by STATE
variable using the aggregate()
function in R with the following code:
stormdata$prop.damage <- with(stormdata, ifelse(PROPDMGEXP == 'K', (PROPDMG * 10^3), ifelse(PROPDMGEXP == 'M', (PROPDMG * 10^6), ifelse(PROPDMGEXP == 'B', (PROPDMG * 10^9), NA))))
stormdata$crop.damage <- with(stormdata, ifelse(CROPDMGEXP == 'K', (CROPDMG * 10^3), ifelse(CROPDMGEXP == 'M', (CROPDMG * 10^6), ifelse(CROPDMGEXP == 'B', (CROPDMG * 10^9), NA))))
damagecost <- with(stormdata, aggregate(x = prop.damage + crop.damage, by = list(STATE), FUN = sum, na.rm = TRUE))
damagecost <- damagecost[order(damagecost$x, decreasing = TRUE), ]
Here the PROPDMGEXP
and CROPDMGEXP
variables are used as a multiplier for the PROPDMG
and CROPDMG
numeric variables. My main data set is stormdata
.
And I get the following:
> head(damagecost)
Group.1 x
8 CA 120211639720
13 FL 27302948100
38 MS 14804212820
63 TX 12550131850
20 IL 11655920860
2 AL 9505473250
But, for example, If I do the addition "manually" for California ('CA') I get this:
> sum(stormdata$prop.damage[stormdata$STATE == 'CA'], na.rm = TRUE) + sum(stormdata$crop.damage[stormdata$STATE == 'CA'], na.rm = TRUE)
[1] 127115859410
I don't understand why I'm getting different results.
Turns out that both variables
prop.damage
andcrop.damage
hadNA
values within them and thoseNAs
were affecting the result when the variables were added in theaggregate
function.