I am computing a similarity matrix based on Euclidean distance in MATLAB. My code is as follows:
for i=1:N % M,N is the size of the matrix x for whose elements I am computing similarity matrix
for j=1:N
D(i,j) = sqrt(sum(x(:,i)-x(:,j)).^2)); % D is the similarity matrix
end
end
Can any help with optimizing this = reducing the for loops as my matrix x
is of dimension 256x30000
.
Thanks a lot!
--Aditya
The function to do so in matlab is called pdist. Unfortunately it is painfully slow and doesnt take Matlabs vectorization abilities into account.
The following is code I wrote for a project. Let me know what kind of speed up you get.
Note though that this will only work if your data points are in the rows and your dimensions the columns. So for example lets say I have 256 data points and 100000 dimensions then on my mac using x=rand(256,100000) and the above code produces a 256x256 matrix in about half a second.
There's probably a better way to do it, but the first thing I noticed was that you could cut the runtime in half by exploiting the symmetry
D(i,j)==D(i,j)
You can also use the function
norm(x(:,i)-x(:,j),2)
To start with, you are computing twice as much as you need to here, because D will be symmetric. You don't need to calculate the (i,j) entry and the (j,i) entry separately. Change your inner loop to
for j=1:i
, and add in the body of that loopD(j,i)=D(i,j);
After that, there's really not much redundancy left in what that code does, so your only room left for improvement is to parallelize it: if you have the Parallel Computing Toolbox, convert your outer loop to a
parfor
and before you run it, saymatlabpool(n)
, wheren
is the number of threads to use.I think this is what you're looking for.
Here, I have assumed that the distance vector,
x
is initalized as anNxM
array, whereM
is the number of dimensions of the system andN
is the number of points. So if your ordering is different, you'll have to make changes accordingly.