I am trying to calculate the zscore for a vector of 5000 rows which has many nan values. I have to calculate this many times so I dont want to use a loop, I was hoping to find a vectorized solution.
the loop solution:
for i = 1:end
vec(i,1) = (val(i,1) - nanmean(:,1))/nanstd(:,1)
end
a partial vectorized solution:
zscore(vec(find(isnan(vec(1:end) == 0))))
but this returns a vector the length of the original vector minus the nan values. Thus it isn't the same as the original size.
I want to calculated the zscore for the vector and then interpolate missing data after words. I have to do this 100s of times thus I am looking for a fast vectorized approach.
This is a vectorized solution:
% generate some example data with
NaN
s.Here's the code:
Then column vector
valZscore
contains deviations (Z scores), and hasNaN
values forNaN
values inval
, the original measurement data.vectorized version of below anonymous function (assumes observations are in rows, variables in columns):
Sorry this answer is 6 months late, but for anyone else who comes across this thread:
The accepted answer isn't fully vectorised in that it doesn't do what the real
zscore
does so beautifully: That is, do zscores along a particular dimension of a matrix.If you want to calculate zscores of a large number of vectors at once, as the OP says he is doing, the best solution is this:
To do it on an arbitrary dimension, just put the dimension inside the
nanmean
andnanstd
, and bsxfun takes care of the rest.anonymous function:
nanZ = @(xIn)(xIn-nanmean(xIn))/nanstd(xIn);
nanZ(vectorWithNans)