I am a political science student and learning R. I have a problem with a nested loop, one of my indices being non-numeric.
I have a data frame pwt
containing, for each country in the world (column country
) and each year from 1950 to 2011 (column year
) a number of development indicators, among which is GDP.
I would like to add a column that contains the % change in GDP from a year to the next.
Here is the error I get:
Error in `[<-.factor`(`*tmp*`, iseq, value = numeric(0)): replacement has length zero
GDPgrowth = rep("NA", length(pwt$country))
pwt <- cbind.data.frame(pwt, GDPgrowth)
countries <- unique(pwt$country)
for(i in countries) # for each country
{
for(j in 1951:2011) # for each year
{
pwt[pwt$country == i & pwt$year == j,"GDPgrowth"] = (pwt[pwt$country == i
& pwt$year == j,"rdgpo"]/pwt[pwt$country == i & pwt$year == j-1,"rdgpo"] -
1)*100
}
}
What did I get wrong?
Welcome to Stack Overflow!
For this sort of rolling/thing-over-thing, etc. you can use zoo, dplyr, or data.table. I personally prefer the latter for its flexibility and (running) speed for large datasets. Vs. using a loop, these will generally be faster and more syntactically convenient.
Assuming your data looks something like this (numbers obviously made up):
You can use data.table's shift to calculate values from leading/lagging values. In this case:
Gives:
Another way would be to use
diff
from baseR
. This is used to calculate difference between immediate valuesThis would give you difference between consecutive GDP's which you can easily use to find percentage grouth.
PS: SO is to help people out and not provide exact solution and spoon feed. Thus this answer just points you in a direction and not gives you exact solution.
You can also avoid the loop:
By the same token another convenient solution avoiding the loop can be achieved with use of
dplyr
.As a side point I would suggest that, in line with the SO guidelines, you provide reproducible example. In terms of major publicly available statistical repositories (Eurostat, OECD, World Bank, etc.) there are R packagaes and tutorials that make sourcing the desired data effortless. In the example above I'm using the WDI package to source the World Bank data.
Edit
Finally, if you insist on making things in the loop you can do it like that:
The solution could be less explicit but I wanted to emphasise the need of picking the right row for each combination of year and country that is implemented in the
which
statement.Benchmarking
The loop approach appears to be rather inefficient:
Benchmarked functions from above