I want to calculate a rolling maximum and minimum value efficiently. Meaning anything better than recalculating the maximum/minimum from all the values in use every time the window moves.
There was a post on here that asked the same thing and someone posted a solution involving some kind of stack approach that supposedly worked based on its rating. However I can't find it again for the life of me.
Any help would be appreciated in finding a solution or the post. Thank you all!
The algorithm you want to use is called the ascending minima (C++ implementation).
To do this in C#, you will want to get a double ended queue class, and a good one exists on NuGet under the name Nito.Deque.
I have written a quick C# implementation using Nito.Deque, but I have only briefly checked it, and did it from my head so it may be wrong!
I'm assuming that by "window" you mean a range
a[start]
toa[start + len]
, and thatstart
moves along. Consider the minimal value, the maximal is similar, and the move to the windowa[start + 1]
toa[start + len + 1]
. Then the minimal value of the window will change only if (a)a[start + len + 1] < min
(a smaller value came in), or (b)a[start] == min
(one of the smallest values just left; recompute the minimum).Another, possibly more efficient way of doing this, is to fill a priority queue with the first window, and update with each value entering/leaving, but I don't think that is much better (priority queues aren't suited to "pick out random element from the middle" (what you need to do when advancing the window). And the code will be much more complex. Better stick to the simple solution until proven that the performance isn't acceptable, and that this code is responsible for (much of) the resource consumption.
After writing my own algo yesterday, and asking for improvements, I was referred here. Indeed this algo is more elegant. I'm not sure it offers constant speed calc regardless of the window size, but regardless, I tested the performance vs my own caching algo (fairly simple, and probably uses the same idea as others have proposed). the caching is 8-15 times faster (tested with rolling windows of 5,50,300,1000 I don't need more). below are both alternatives with stopwatches and result validation.
Here's one way to do it more efficiently. You still have to calculate the value occasionally but, other than certain degenerate data (ever decreasing values), that's minimised in this solution.
We'll limit ourselves to the maximum to simplify things but it's simple to extend to a minimum as well.
All you need is the following:
max
), initially any value.maxcount
), initially zero.The idea is to use
max
andmaxcount
as a cache for holding the current maximum. Where the cache is valid, you only need to return the value in it, a very fast constant-time operation.If the cache is invalid when you ask for the maximum, it populates the cache and then returns that value. This is slower than the method in the previous paragraph but subsequent requests for the maximum once the cache is valid again use that faster method.
Here's what you do for maintaining the window and associated data:
Get the next value
N
.If the window is full, remove the earliest entry
M
. If maxcount is greater than 0 andM
is equal tomax
, decrementmaxcount
. Oncemaxcount
reaches 0, the cache is invalid but we don't need to worry about that until such time the user requests the maximum value (there's no point repopulating the cache until then).Add
N
to the rolling window.If the window size is now 1 (that
N
is the only current entry), setmax
toN
andmaxcount
to 1, then go back to step 1.If
maxcount
is greater than 0 andN
is greater thanmax
, setmax
toN
andmaxcount
to 1, then go back to step 1.If
maxcount
is greater than 0 andN
is equal tomax
, incrementmaxcount
.Go back to step 1.
Now, at any point while that window management is going on, you may request the maximum value. This is a separate operation, distinct from the window management itself. This can be done using the following rules in sequence.
If the window is empty, there is no maximum: raise an exception or return some sensible sentinel value.
If
maxcount
is greater than 0, then the cache is valid: simply returnmax
.Otherwise, the cache needs to be repopulated. Go through the entire list, setting up
max
andmaxcount
as per the code snippet below.The fact that you mostly maintain a cache of the maximum value and only recalculate when needed makes this a much more efficient solution than simply recalculating blindly whenever an entry is added.
For some definite statistics, I created the following Python program. It uses a sliding window of size 25 and uses random numbers from 0 to 999 inclusive (you can play with these properties to see how they affect the outcome).
First some initialisation code. Note the
stat
variables, they'll be used to count cache hits and misses:Then the function to add a number to the window, as per my description above:
Next, the code which returns the maximum value from the window:
And, finally, the test harness:
Note that the test harness attempts to get the maximum for every time you add a number to the window. In practice, this may not be needed. In other words, this is the worst-case scenario for the random data generated.
Running that program a few times for pseudo-statistical purposes, we get (formatted and analysed for reporting purposes):
So you can see that, on average for random data, only about 3.95% of the cases resulted in a calculation hit (cache miss). The vast majority used the cached values. That should be substantially better than having to recalculate the maximum on every insertion into the window.
Some things that will affect that percentage will be:
0..999
to0..9
gave a big improvement in reducing cache misses (0.85%).I would suggest maintaining a stack which supports
getMin()
orgetMax()
.This can be done with two stacks and costs only constant time.
fyi: https://www.geeksforgeeks.org/design-a-stack-that-supports-getmin-in-o1-time-and-o1-extra-space/