Getting different results using aggregate() and su

2019-09-19 01:47发布

I'm trying to get a summary data frame of the total quantities of the variables prop.damage and crop.damage by STATE variable using the aggregate() function in R with the following code:

stormdata$prop.damage <- with(stormdata, ifelse(PROPDMGEXP == 'K', (PROPDMG * 10^3), ifelse(PROPDMGEXP == 'M', (PROPDMG * 10^6), ifelse(PROPDMGEXP == 'B', (PROPDMG * 10^9), NA))))
stormdata$crop.damage <- with(stormdata, ifelse(CROPDMGEXP == 'K', (CROPDMG * 10^3), ifelse(CROPDMGEXP == 'M', (CROPDMG * 10^6), ifelse(CROPDMGEXP == 'B', (CROPDMG * 10^9), NA))))
damagecost <- with(stormdata, aggregate(x = prop.damage + crop.damage, by = list(STATE), FUN = sum, na.rm = TRUE))
damagecost <- damagecost[order(damagecost$x, decreasing = TRUE), ]

Here the PROPDMGEXP and CROPDMGEXP variables are used as a multiplier for the PROPDMG and CROPDMG numeric variables. My main data set is stormdata.

And I get the following:

> head(damagecost)
   Group.1            x
8       CA 120211639720
13      FL  27302948100
38      MS  14804212820
63      TX  12550131850
20      IL  11655920860
2       AL   9505473250

But, for example, If I do the addition "manually" for California ('CA') I get this:

> sum(stormdata$prop.damage[stormdata$STATE == 'CA'], na.rm = TRUE) + sum(stormdata$crop.damage[stormdata$STATE == 'CA'], na.rm = TRUE)
[1] 127115859410

I don't understand why I'm getting different results.

标签: sum aggregate
2楼-- · 2019-09-19 02:18

Turns out that both variables prop.damage and crop.damage had NA values within them and those NAs were affecting the result when the variables were added in the aggregate function.

登录 后发表回答