I am looking for some performance gains in terms of rolling/sliding window functions in R. It is quite common task which can be used in any ordered observations data set. I would like to share some of my findings, maybe somebody would be able to provide feedback to make it even faster.
Important note is that I focus on the case align="right"
and adaptive rolling window, so width
is a vector (same length as our observation vector). In case if we have width
as scalar there are already very well developed functions in zoo
and TTR
packages which would be very hard to beat (4 years later: it was easier than I expected) as some of them are even using Fortran (but still user-defined FUNs can be faster using mentioned below wapply
).
RcppRoll
package is worth to mention due to its great performance, but so far there is no function which answers to that question. Would be great if someone could extend it to answer the question.
Consider we have a following data:
x = c(120,105,118,140,142,141,135,152,154,138,125,132,131,120)
plot(x, type="l")
And we want to apply rolling function over x
vector with variable rolling window width
.
set.seed(1)
width = sample(2:4,length(x),TRUE)
In this particular case we would have rolling function adaptive to sample
of c(2,3,4)
.
We will apply mean
function, expected results:
r = f(x, width, FUN = mean)
print(r)
## [1] NA NA 114.3333 120.7500 141.0000 135.2500 139.5000
## [8] 142.6667 147.0000 146.0000 131.5000 128.5000 131.5000 127.6667
plot(x, type="l")
lines(r, col="red")
Any indicator can be employed to produce width
argument as different variants of adaptive moving averages, or any other function.
Looking for a top performance.
December 2018 update
Efficient implementation of adaptive rolling functions has been made in data.table recently - more info in ?froll manual. Additionally an efficient alternative solution using base R has been identified (
fastama
below). Unfortunately Kevin Ushey's answer does not address the question thus it is not included in benchmark. Scale of benchmark has been increased as it pointless to compare microseconds.Old answer:
I chose 4 available solutions which doesn't need to do to C++, quite easy to find or google.
Below are the timings for
prod
function.mean
function might be already optimized insiderollapplyr
. All results equal.Somehow people have missed the ultra fast
runmed()
in base R (stats package). It's not adaptive, as far as I understand the original question, but for a rolling median, it's FAST! Comparing here toroll_median()
from RcppRoll.For reference, you should definitely check out
RcppRoll
if you have only a single window length to 'roll' over:gives me
It's a bit faster ;) and the package is flexible enough for users to define and use their own rolling functions (with C++). I may extend the package in the future to allow multiple window widths, but I am sure it will be tricky to get right.
If you want to define the
prod
yourself, you can do so --RcppRoll
allows you to define your own C++ functions to pass through and generate a 'rolling' function if you'd like.rollit
gives a somewhat nicer interface, whilerollit_raw
just lets you write a C++ function yourself, somewhat like you might do withRcpp::cppFunction
. The philosophy being, you should only have to express the computation you wish to perform on a particular window, andRcppRoll
can take care of iterating over windows of some size.gives me
So really, as long as you are capable of expressing the computation you wish to perform in a particular window through either the
rollit
interface or with a C++ function passed throughrollit_raw
(whose interface is a bit rigid, but still functional), you are in good shape.