This is a my df
(data.frame):
group value
1 10
1 20
1 25
2 5
2 10
2 15
I need to calculate difference between values in consecutive rows by group.
So, I need a that result.
group value diff
1 10 NA # because there is a no previous value
1 20 10 # value[2] - value[1]
1 25 5 # value[3] value[2]
2 5 NA # because group is changed
2 10 5 # value[5] - value[4]
2 15 5 # value[6] - value[5]
Although, I can handle this problem by using ddply
, but it takes too much time. This is because I have a lot of groups in my df
. (over 1,000,000 groups in my df
)
Are there any other effective approaches to handle this problem?
try this with tapply
The package
data.table
can do this fairly quickly, using theshift
function.Or using the
lag
function indplyr
For alternatives pre-
data.table::shift
and pre-dplyr::lag
, see edits.You can use the base function
ave()
for thiswhich returns