I'm trying to find an efficient, numerically stable algorithm to calculate a rolling variance (for instance, a variance over a 20-period rolling window). I'm aware of the Welford algorithm that efficiently computes the running variance for a stream of numbers (it requires only one pass), but am not sure if this can be adapted for a rolling window. I would also like the solution to avoid the accuracy problems discussed at the top of this article by John D. Cook. A solution in any language is fine.
相关问题
- Finding k smallest elements in a min heap - worst-
- binary search tree path list
- High cost encryption but less cost decryption
- How to get a fixed number of evenly spaced points
- Space complexity of validation of a binary search
相关文章
- What are the problems associated to Best First Sea
- Coin change DP solution to keep track of coins
- Algorithm for partially filling a polygonal mesh
- Robust polygon normal calculation
- Algorithm for maximizing coverage of rectangular a
- McNemar's test in Python and comparison of cla
- Is there an API to get statictics on Google Play d
- How to measure complexity of a string?
This is just a minor addition to the excellent answer provided by DanS. The following equations are for removing the oldest sample from the window and updating the mean and variance. This is useful, for example, if you want to take smaller windows near the right edge of your input data stream (i.e. just remove the oldest window sample without adding a new sample).
Here, x_old is the oldest sample in the window you wish to remove.
Here's a divide and conquer approach that has
O(log k)
-time updates, wherek
is the number of samples. It should be relatively stable for the same reasons that pairwise summation and FFTs are stable, but it's a bit complicated and the constant isn't great.Suppose we have a sequence
A
of lengthm
with meanE(A)
and varianceV(A)
, and a sequenceB
of lengthn
with meanE(B)
and varianceV(B)
. LetC
be the concatenation ofA
andB
. We haveNow, stuff the elements in a red-black tree, where each node is decorated with mean and variance of the subtree rooted at that node. Insert on the right; delete on the left. (Since we're only accessing the ends, a splay tree might be
O(1)
amortized, but I'm guessing amortized is a problem for your application.) Ifk
is known at compile-time, you could probably unroll the inner loop FFTW-style.I have been dealing with the same issue.
Mean is simple to compute iteratively, but you need to keep the complete history of values in a circular buffer.
I have adapted Welford's algorithm and it works for all the values that I have tested with.
To get the current variance just divide varSum by the window size:
variance = varSum / window_size;
I know this question is old, but in case someone else is interested here follows the python code. It is inspired by johndcook blog post, @Joachim's, @DanS's code and @Jaime comments. The code below still gives small imprecisions for small data windows sizes. Enjoy.
If you prefer code over words (heavily based on DanS' post): http://calcandstuff.blogspot.se/2014/02/rolling-variance-calculation.html
I guess keeping track of your 20 samples, Sum(X^2 from 1..20), and Sum(X from 1..20) and then successively recomputing the two sums at each iteration isn't efficient enough? It's possible to recompute the new variance without adding up, squaring, etc., all of the samples each time.
As in: