Matrix A is my starting matrix it holds the data logged from my MPU6050 and GPS on an SD Card (Latitude, Longitude, Time, Ax, Ay, Az, Gx,Gy,Gz).
I calculated the standard deviation of Az for window size of 5 and identified all the elements that satisfy a condition (>threshold).
Then in a matrix "large_windows" i stored the index of all the Az in the window that satisfy the condition.
From matrix "large_windows" i calculated a new matrix B with all the rows from matrix A that contain the matrix "large_windows" elements.
I think my code is effective, but very ugly and chaotic, plus i am still not very practical with indexing but i want to learn it.
1. Does a better solution exist?
2. It is possible to use a logic indexing? How? It is efficient*?
Here my code, is a simplified example, with generic condition, to understand the whole concept better not only my specific situation, starting from suggestions of a previous problem(how to create a sliding window
%random matix nXm
a=rand(100,6);
%window dimension
window_size=4;
%overlap between two windows
overlap=1;
%increment needed
step=window_size - overlap;
%std threshold
threshold=0.3;
std_vals= NaN(size(a,1),1);
%The sliding window will analyze only the 5th column
for i=1: step: (size(a,1)-window_size)
std_vals(i)=std(a(i:(i+window_size-1),5));
end
% finding the rows with standard deviation larger than threshold
large_indexes = find(std_vals>threshold);
%Storing all the elements that are inside the window with std>threshold
large_windows = zeros(numel(large_indexes), window_size);
for i=1:window_size
large_windows(:,i) = large_indexes + i - 1;
end
% Starting extracting all the rows with the 5th column outlier elements
n=numel(large_windows);
%Since i will work can't know how long will be my dataset
%i need to knwo how is the "index distance" between two adjacent elements
% in the same row [es. a(1,1) and a(1,2)]
diff1=sub2ind(size(a),1,1);
diff2=sub2ind(size(a),1,2);
l_2_a_r_e = diff2-diff1 %length two adjacent row elements
large_windows=large_windows'
%calculating al the index of the element of a ith row containing an anomaly
for i=1:n
B{i}=[a(large_windows(i))-l_2_a_r_e*4 a(large_windows(i))-l_2_a_r_e*3 a(large_windows(i))-l_2_a_r_e*2 a(large_windows(i))-l_2_a_r_e*1 a(large_windows(i))-l_2_a_r_e*0 a(large_windows(i))+l_2_a_r_e];
end
C= cell2mat(B');
I also read some question before posting it, but This was to specific
B is not included in A so this question is not helpful Find complement of a data frame (anti - join)
I don't know how to use ismember
in this specific case
I hope my drawing could better explain my problem :)
Thanks for your time
Here's a new approach to achieve the result that you actually wanted to achieve. I corrected 2 mistakes that you made and replaced all the for loops with bsxfun
which is a very efficient function to do stuff like this. For Matlab R2016b or newer you can also implicit expansion instead of bsxfun
.
My starts at you implementation of the sliding window. Instead of your for
-loop, you can use
stdInds=bsxfun(@plus,1:step:(size(a,1)-overlap),(0:3).');
std_vals=std(a(sub2ind(size(a),stdInds,repmat(5,size(stdInds)))));
here. The bsxfun creates an array that holds the rows of your windows. It holds 1 windo in each column. These rows need to be transformed into linear index of the a
-array in order to get an array of values, that can be passed to the std
-function. In your implementation you made a small mistake here, because your for
-loop ends at size(a,1)-window_size
and should actually have ended at size(a,1)-overlap
, because otherwise you are missing the last window.
Now that we got the std-values of the windows we can check which ones are greater than your predefined threshhold and then transform them back into the corresponding rows:
highStdWindows=find(std_vals_2>threshold);
highStdRows=bsxfun(@plus,highStdWindows*step-step+1,(0:3).');
highStdWindows
contains the indexes of the windows, that have high-Std-values. In the next line, we calculate the starting rows of these windows using highStdWindows*step-step+1
and then we calculate the other rows that are corresponding to each window using the bsxfun
again.
Now we get to the actual mistake in your code. This line right here
B{i}=[a(large_windows(i))-l_2_a_r_e*4 a(large_windows(i))-l_2_a_r_e*3 a(large_windows(i))-l_2_a_r_e*2 a(large_windows(i))-l_2_a_r_e*1 a(large_windows(i))-l_2_a_r_e*0 a(large_windows(i))+l_2_a_r_e];
does not do what you wanted it to do. Unfortunatly you missplaced a couple of brackets here. This way you take the large_windows(i)'th element of matrix a
and substract 4*l_2_a_r_e
from it. What you wanted to write was
B{i}==[a(large_windows(i)-l_2_a_r_e*4) % and so on
This way you would substract the 4*l_2_a_r_e
from the index that you pass to a
. This would still be wrong, because in large_windows you stored row-numbers and not linear indexes corresponding to matrix a
.
Nevertheless this can be achieved a lot easier using subscripted indexing instead of linear indexing:
rowList=reshape(highStdRows,1,[]);
C=a(rowList,:); % all columns (:) and from the rows in rowList
These two easy lines tell matlab to take all rows that are stored in highStdRows
with all columns (expressed by the :
). With this if there are two adjacent windows with high-Std-values you will get the overlapping rows twice. If you don't want that, you can use this code instead:
rowList=unique(reshape(highStdRows,1,[]));
C=a(rowList,:);
If you want to get further insides on how indexing in Matlab works take a look at LuisMendo's post about this topic.