R getting rid of nested for loops

2019-07-18 05:25发布

问题:

I did quite some searching on how to simplify the code for the problem below but was not successful. I assume that with some kind of apply-magic one could speed things up a little, but so far I still have my difficulties with these kind of functions ....

I have an data.frame data, structured as follows:

year   iso3c gdpppc   elec solid liquid   heat
2010    USA    1567   1063  1118   835    616
2015    USA    1571     NA    NA    NA     NA
2020    USA    1579     NA    NA    NA     NA
 ...    USA     ...     NA    NA    NA     NA
2100    USA    3568     NA    NA    NA     NA
2010    ARG     256    145    91    85     37
2015    ARG     261     NA    NA    NA     NA
2020    ARG     270     NA    NA    NA     NA
 ...    ARG     ...     NA    NA    NA     NA
2100    ARG     632     NA    NA    NA     NA

As you can see, I have a historical starting value for 2010 and a complete scenario for gdppc up to 2100. I want to let values for elec, solid, liquid and heat grow according to some elasticity with respect to the development of gdppc, but separately for each country (coded in iso3c). I have the elasticities defined in a separate data.frame parameters:

  item value
  elec   0.5
liquid   0.2
 solid  -0.1
  heat   0.1

So far I am using a nested for loop:

for (e in 1:length(levels(parameters$item)){
  for (c in 1:length(levels(data$iso3c)){
    tmp <- subset(data, select=c("year", "iso3c", "gdppc", parameters[e, "item"]), subset=("iso3c" == levels(data$iso3c)[c]))

    tmp[tmp$year %in% seq(2015, 2100, 5), parameters[e, "item"]] <-
      tmp[tmp$year == 2010, parameters[e, "item"]] *
      cumprod((1 + (tmp[tmp$year %in% seq(2015, 2100, 5), "gdppc"] /
      tmp[tmp$year %in% seq(2010, 2095, 5), "gdppc"] - 1) * parameters[e, "value"]))

    data[data$iso3c == levels(data$iso3c)[i] & data$year %in% seq(2015, 2100, 5), parameters[e, "item"]] <- tmp[tmp$year > 2010, parameters[e, "item"]]

   }
 }

The outer loop loops over the columns and the inner one over the countries. The inner loop runs for every country (I have 180+ countries). First, a subset containing data on one single country and on the variable of interest is selected. Then I let the respective variable grow with a certain elasticity to growth in gdppc and finally put the subset back into place in data. I have already tried to let the outer loop run in parallel using foreach but was not succesful recombining the results. Since I have to run similar calculations quite often I would be very grateful for any help.

Thanks

回答1:

Here's one way. Note I renamed your parameters data.frame to p

library(data.table)
library(reshape2)

dt <- data.table(data)
dt.melt = melt(dt,id=1:3)
dt.melt[,value:=as.numeric(value)]    # coerce value column to numeric
dt.melt[,value:=head(value,1)+(gdpppc-head(gdpppc,1))*p[p$item==variable,]$value,
         by="iso3c,variable"]
result <- dcast(dt.melt,iso3c+year+gdpppc~variable)
result
#   iso3c year gdpppc   elec  solid liquid  heat
# 1   ARG 2010    256  145.0   91.0   85.0  37.0
# 2   ARG 2015    261  147.5   90.5   86.0  37.5
# 3   ARG 2020    270  152.0   89.6   87.8  38.4
# 4   ARG 2100    632  333.0   53.4  160.2  74.6
# 5   USA 2010   1567 1063.0 1118.0  835.0 616.0
# 6   USA 2015   1571 1065.0 1117.6  835.8 616.4
# 7   USA 2020   1579 1069.0 1116.8  837.4 617.2
# 8   USA 2100   3568 2063.5  917.9 1235.2 816.1

The basic idea is to use the melt(...) function to reshape your original data into "long" format, where the values in the four columns solid, liquid, elec, and heat are all in one column, value, and the column variable indicates which metric value refers to. Now, using data tables, you can fill in the values easily. Then, reshape the result back into wide format using dcast(...).