I need to do function that works like this :
N1 = size(X,1);
N2 = size(Xtrain,1);
Dist = zeros(N1,N2);
for i=1:N1
for j=1:N2
Dist(i,j)=D-sum(X(i,:)==Xtrain(j,:));
end
end
(X and Xtrain are sparse logical matrixes)
It works fine and passes the tests, but I believe it's not very optimal and well-written solution.
How can I improve that function using some built Matlab functions? I'm absolutely new to Matlab, so I don't know if there really is an opportunity to make it better somehow.
You wanted to learn about vectorization, here some code to study comparing different implementations of this pair-wise distance.
First we build two binary matrices as input (where each row is an instance):
m = 5;
n = 4;
p = 3;
A = double(rand(m,p) > 0.5);
B = double(rand(n,p) > 0.5);
1. double-loop over each pair of instances
D0 = zeros(m,n);
for i=1:m
for j=1:n
D0(i,j) = sum(A(i,:) ~= B(j,:)) / p;
end
end
2. PDIST2
D1 = pdist2(A, B, 'hamming');
3. single-loop over each instance against all other instances
D2 = zeros(m,n);
for i=1:n
D2(:,i) = sum(bsxfun(@ne, A, B(i,:)), 2) ./ p;
end
4. vectorized with grid indexing, all against all
D3 = zeros(m,n);
[x,y] = ndgrid(1:m,1:n);
D3(:) = sum(A(x(:),:) ~= B(y(:),:), 2) ./ p;
5. vectorized in third dimension, all against all
D4 = sum(bsxfun(@ne, A, reshape(B.',[1 p n])), 2) ./ p;
D4 = permute(D4, [1 3 2]);
Finally we compare all methods are equal
assert(isequal(D0,D1,D2,D3,D4))