I have been searching the web for methods that could create rolling windows so that I can perform a cross-validation technique known as Walk Forward Analysis for time series in a generalized manner.
However, I have not get around to any solution that incorporates flexibility in terms of 1) the window size (almost all methods have this; for example, pandas
rolling or a bit different np.roll) and 2) window rolling quantity, understood as how many indexes do we want to roll the window (i.e. haven't found any that incorporates this).
I have been trying to optimize and make concise code, with the help of @coldspeed in this answer (I'm unable to comment there because I don't reach the needed reputation; hope to get there soon!), but I haven't been able to incorporate the window rolling quantity.
My thinkings:
I have tried with
np.roll
together with my below example, with no success.I have also tried to modify the code below multiplying the
ith
value, but I don't get to fit it within the list comprehension, which I would like to maintain.
3. The example below does great for any window size, BUT, it only "rolls" the window one step ahead and I would like that it could be generalized to any step.
So, ¿is there any way to have this two parameters available within the list comprehension approach? or, ¿is there any other resource which I did not find that makes this easier? All the help is very much appreciated. My example code is the following:
In [1]: import numpy as np
In [2]: arr = np.random.random((10,3))
In [3]: arr
Out[3]: array([[0.38020065, 0.22656515, 0.25926935],
[0.13446667, 0.04386083, 0.47210474],
[0.4374763 , 0.20024762, 0.50494097],
[0.49770835, 0.16381492, 0.6410294 ],
[0.9711233 , 0.2004874 , 0.71186102],
[0.61729025, 0.72601898, 0.18970222],
[0.99308981, 0.80017134, 0.64955358],
[0.46632326, 0.37341677, 0.49950571],
[0.45753235, 0.55642914, 0.31972887],
[0.4371343 , 0.08905587, 0.74511753]])
In [4]: inSamplePercentage = 0.4
In [5]: outSamplePercentage = 0.3 * inSamplePercentage
In [6]: windowSizeTrain = round(inSamplePercentage * arr.shape[0])
In [7]: windowSizeTest = round(outSamplePercentage * arr.shape[0])
In [8]: windowTrPlusTs = windowSizeTrain + windowSizeTest
In [9]: sliceListX = [arr[i: i + windowTrPlusTs] for i in range(len(arr) - (windowTrPlusTs-1))]
Given a window length of 5 and a window roll qty of 2, I could spec something like this:
Out [15]:
[array([[0.38020065, 0.22656515, 0.25926935],
[0.13446667, 0.04386083, 0.47210474],
[0.4374763 , 0.20024762, 0.50494097],
[0.49770835, 0.16381492, 0.6410294 ],
[0.9711233 , 0.2004874 , 0.71186102]]),
array([[0.4374763 , 0.20024762, 0.50494097],
[0.49770835, 0.16381492, 0.6410294 ],
[0.9711233 , 0.2004874 , 0.71186102],
[0.61729025, 0.72601898, 0.18970222],
[0.99308981, 0.80017134, 0.64955358]]),
array([[0.9711233 , 0.2004874 , 0.71186102],
[0.61729025, 0.72601898, 0.18970222],
[0.99308981, 0.80017134, 0.64955358],
[0.46632326, 0.37341677, 0.49950571],
[0.45753235, 0.55642914, 0.31972887]]),
array([[0.99308981, 0.80017134, 0.64955358],
[0.46632326, 0.37341677, 0.49950571],
[0.45753235, 0.55642914, 0.31972887],
[0.4371343 , 0.08905587, 0.74511753]])]
(This incorporates the last array, although its lenght is less than 5).
OR:
Out [16]:
[array([[0.38020065, 0.22656515, 0.25926935],
[0.13446667, 0.04386083, 0.47210474],
[0.4374763 , 0.20024762, 0.50494097],
[0.49770835, 0.16381492, 0.6410294 ],
[0.9711233 , 0.2004874 , 0.71186102]]),
array([[0.4374763 , 0.20024762, 0.50494097],
[0.49770835, 0.16381492, 0.6410294 ],
[0.9711233 , 0.2004874 , 0.71186102],
[0.61729025, 0.72601898, 0.18970222],
[0.99308981, 0.80017134, 0.64955358]]),
array([[0.9711233 , 0.2004874 , 0.71186102],
[0.61729025, 0.72601898, 0.18970222],
[0.99308981, 0.80017134, 0.64955358],
[0.46632326, 0.37341677, 0.49950571],
[0.45753235, 0.55642914, 0.31972887]])]
(Only the arrays with lenght == 5 -> However, this could be derived from the one above with a simple mask).
EDIT: Forgot to mention this also -- Something could be done if pandas rolling objects support iter metho.