Optimize MATLAB code (nested for loop to compute s

2020-06-06 04:53发布

I am computing a similarity matrix based on Euclidean distance in MATLAB. My code is as follows:

for i=1:N % M,N is the size of the matrix x for whose elements I am computing similarity matrix
 for j=1:N
  D(i,j) = sqrt(sum(x(:,i)-x(:,j)).^2)); % D is the similarity matrix
 end
end

Can any help with optimizing this = reducing the for loops as my matrix x is of dimension 256x30000.

Thanks a lot!

--Aditya

4条回答
贼婆χ
2楼-- · 2020-06-06 05:32

The function to do so in matlab is called pdist. Unfortunately it is painfully slow and doesnt take Matlabs vectorization abilities into account.

The following is code I wrote for a project. Let me know what kind of speed up you get.

   Qx=repmat(dot(x,x,2),1,size(x,1));
   D=sqrt(Qx+Qx'-2*x*x');

Note though that this will only work if your data points are in the rows and your dimensions the columns. So for example lets say I have 256 data points and 100000 dimensions then on my mac using x=rand(256,100000) and the above code produces a 256x256 matrix in about half a second.

查看更多
叛逆
3楼-- · 2020-06-06 05:43

There's probably a better way to do it, but the first thing I noticed was that you could cut the runtime in half by exploiting the symmetry D(i,j)==D(i,j)

You can also use the function norm(x(:,i)-x(:,j),2)

查看更多
家丑人穷心不美
4楼-- · 2020-06-06 05:54

To start with, you are computing twice as much as you need to here, because D will be symmetric. You don't need to calculate the (i,j) entry and the (j,i) entry separately. Change your inner loop to for j=1:i, and add in the body of that loop D(j,i)=D(i,j);

After that, there's really not much redundancy left in what that code does, so your only room left for improvement is to parallelize it: if you have the Parallel Computing Toolbox, convert your outer loop to a parfor and before you run it, say matlabpool(n), where n is the number of threads to use.

查看更多
放荡不羁爱自由
5楼-- · 2020-06-06 05:57

I think this is what you're looking for.

D=zeros(N);    
jIndx=repmat(1:N,N,1);iIndx=jIndx'; %'# fix SO's syntax highlighting
D(:)=sqrt(sum((x(iIndx(:),:)-x(jIndx(:),:)).^2,2));

Here, I have assumed that the distance vector, x is initalized as an NxM array, where M is the number of dimensions of the system and N is the number of points. So if your ordering is different, you'll have to make changes accordingly.

查看更多
登录 后发表回答