Suppose I have the following data frame
Website <- rep(paste("Website",1:3),2)
Year <- c(rep(2013,3),rep(2014,3))
V1 <- c(10,20,50,20,30,70)
V2 <- c(5,15,30,15,30,45)
df <- data.frame(Website,Year,V1,V2)
df
Website Year V1 V2
1 Website 1 2013 10 5
2 Website 2 2013 20 15
3 Website 3 2013 50 30
4 Website 1 2014 20 15
5 Website 2 2014 30 30
6 Website 3 2014 70 45
What I want to find is the growth for each website
from year 2013
to 2014
i.e. (x1 - x0)/x0 for both variables. This would result in a data frame that does the following
Website V1 V2
1 Website 1 1.0 2.0
2 Website 2 0.5 1.0
3 Website 3 0.4 0.5
This is just the growth rates for each Website
for both variables, V1
and V2
.
Assuming that you have more years,
dplyr
handles it beautifully.A
data.table
option (I am usingdata.table_1.9.5
that introduced the functionshift
). Assuming that the year column is "ordered", convert the "data.frame" to "data.table" usingsetDT
, loop through the columns ("V1", "V2") withlapply
(specify the columns in.SDcols
) and do the calculation for individual columns (x/shift(x)...
). The default setting forshift
istype='lag'
andn=1L
. If you want to remove the NA rows, you can usena.omit
which is also fast in the devel version.