Getting different results using aggregate() and su

2019-09-19 01:55发布

问题:

I'm trying to get a summary data frame of the total quantities of the variables prop.damage and crop.damage by STATE variable using the aggregate() function in R with the following code:

stormdata$prop.damage <- with(stormdata, ifelse(PROPDMGEXP == 'K', (PROPDMG * 10^3), ifelse(PROPDMGEXP == 'M', (PROPDMG * 10^6), ifelse(PROPDMGEXP == 'B', (PROPDMG * 10^9), NA))))
stormdata$crop.damage <- with(stormdata, ifelse(CROPDMGEXP == 'K', (CROPDMG * 10^3), ifelse(CROPDMGEXP == 'M', (CROPDMG * 10^6), ifelse(CROPDMGEXP == 'B', (CROPDMG * 10^9), NA))))
damagecost <- with(stormdata, aggregate(x = prop.damage + crop.damage, by = list(STATE), FUN = sum, na.rm = TRUE))
damagecost <- damagecost[order(damagecost$x, decreasing = TRUE), ]

Here the PROPDMGEXP and CROPDMGEXP variables are used as a multiplier for the PROPDMG and CROPDMG numeric variables. My main data set is stormdata.

And I get the following:

> head(damagecost)
   Group.1            x
8       CA 120211639720
13      FL  27302948100
38      MS  14804212820
63      TX  12550131850
20      IL  11655920860
2       AL   9505473250

But, for example, If I do the addition "manually" for California ('CA') I get this:

> sum(stormdata$prop.damage[stormdata$STATE == 'CA'], na.rm = TRUE) + sum(stormdata$crop.damage[stormdata$STATE == 'CA'], na.rm = TRUE)
[1] 127115859410

I don't understand why I'm getting different results.

回答1:

Turns out that both variables prop.damage and crop.damage had NA values within them and those NAs were affecting the result when the variables were added in the aggregate function.



标签: sum aggregate