I have a dataset for price of the ticker on the stock exchange: time - price. But intervals between data points are not equal - from 1 to 2 minutes.
What is the best practice to calculate moving average for such case?
How to make it in Matlab?
I tend to think, that weights of the points should depend on the time interval that was last since previous point. Does we have function in Matlab to calculate moving average with custom weights of the points?
Here is an example of the "naive" approach I mentioned in the comments above:
% some data (unequally spaced in time, but monotonically non-decreasing)
t = sort(rand(50,1));
x = cumsum(rand(size(t))-0.5);
% linear interpolatation on equally-spaced intervals
tt = linspace(min(t), max(t), numel(t));
xx = interp1(t, x, tt, 'linear');
% plot two data vectors
plot(t, x, 'b.-', tt, xx, 'r.:')
legend({'original', 'equally-spaced'})
My answer is quite similar to lakesh's one. But I will think your problem in terms of interpolation.
First of all, a moving average, or a time average of a function, is the integral of it over a time period, divided by the time length.
In your case, the integral can be seen as a sum, since most generally in each minute the function value is the same. However, your data has unequal time intervals. This can be seen as missing points of the function. Let me explain: for each minute x
, you should have a price f(x)
. But for some times say x=5
, f(x)
is undefined.
One of the ways you can get rid of discontinuities of a function is interpolation - assign some value to the missing points, according to some rules of calculation. The simpliest algorithm is "keeping the previous value", which is essentially lakesh's idea.
But the benefit of thinking in this aspect lies in the ability to make your data more accurate. It may not apply to a stock market case, but should be true generally, such as a temperature measuring or wind speed, which is guaranteed to smoothly change over the time (rather than keeping constant for 2 minutes and suddenly change in one second). You can use different interpolation techniques to polish the data. "Polishing" in this sense is ok because in any way you have to use the concept of "average". A good interpolation should make the data closer to a model that has been proven to work with the real problem.
CODE - I set the max interval to 5 minutes to show huge difference between the two methods. It depends on your observation and experience to decide which (or any other) method is the best to "predict the past".
% reproduce your scenario
N = 20;
max_interval = 5;
time = randi(max_interval,N,1);
time(1) = 1; % first minute
price = randi(10,N,1);
figure(1)
plot(cumsum(time), price, 'ko-', 'LineWidth', 2);
hold on
% "keeping-previous-value" interpolation
interp1 = zeros(sum(time),1)-1;
interp1(cumsum(time)) = price;
while ismember(-1, interp1)
interp1(interp1==-1) = interp1(find(interp1==-1)-1);
end
plot(interp1, 'bx--')
% "midpoint" interpolation
interp2 = zeros(sum(time),1)-1;
interp2(cumsum(time)) = price;
for ii = 1:length(interp2)
if interp2(ii) == -1
t1 = interp2(ii-1);
t2 = interp2( find(interp2(ii:end)>-1, 1, 'first') +ii-1);
interp2(ii) = (t1+t2)/2;
end
end
plot(interp2, 'rd--')
% "modified-midpoint" interpolation
interp3 = zeros(sum(time),1)-1;
interp3(cumsum(time)) = price;
for ii = 1:length(interp3)
if interp3(ii) == -1
t1 = interp3(ii-1);
t2 = interp3( find(interp3(ii:end)>-1, 1, 'first') +ii-1);
alpha = 1 / find(interp3(ii:end)>-1, 1, 'first');
interp3(ii) = (1-alpha)*t1 + alpha*t2;
end
end
plot(interp3, 'm^--')
hold off
legend('original data', 'interp 1', 'interp 2', 'interp 3')
fprintf(['"keeping-previous-value" (weighted sum) \n', ...
' result: %2.4f \n'], mean(interp1));
fprintf(['"midpoint" (linear interpolation) \n', ...
' result: %2.4f \n'], mean(interp2));
fprintf(['"modified-midpoint" (linear interpolation) \n', ...
' result: %2.4f \n'], mean(interp3));
Note: undefined points should be presented by NaN
, but -1
seems easier to play with.
This is my suggestion.
Since you have unequal intervals of data, convert it into equal intervals of data keeping the price constant between unequal intervals.
Then you can use tsmovavg to calculate the moving average of the price series then.
If you are willing to discretize the time value of your data points, the solution should be very straightforward. No matter what kind of window you choose, as long as it's Lipschitz, it can be computed or approximated in amortized O(1) time for each data point or time step using approaches like summed area table.
Else, use a rectangular running window of fixed width that only 'snaps' to data points. Specifically, update the summation of values of all data points within the window only when a data point is joining/leaving the window.
However, if you want to use custom weights for your data points, the method described above no longer works. You can, of course, approximate your spatial kernel with multiple box functions. Otherwise, you might want to look into general bilateral filtering algorithms, as the problem can be formulated as bilateral filtering with a constant range kernel. See the paper Adaptive Manifolds for Real-Time High-Dimensional Filtering for a recently developed algorithm that's relatively easy to implement on this topic. The author's website also provides code in MATLAB.