How to Build a Distance Matrix without a Loop (Vec

2020-02-12 10:50发布

问题:

I have many points and I want to build distance matrix i.e. distance of every point with all of other points but I want to don't use from loop because take too time... Is a better way for building this matrix? this is my loop: for a setl with size: 10000x3 this method take a lot of my time :(

 for i=1:size(setl,1)
     for j=1:size(setl,1)        
         dist = sqrt((xl(i)-xl(j))^2+(yl(i)-yl(j))^2+...
             (zl(i)-zl(j))^2);
         distanceMatrix(i,j) = dist;
     end
 end

回答1:

How about using some linear algebra? The distance of two points can be computed from the inner product of their position vectors,

D(x, y) = ∥y – x∥ = √ ( xT x + yT y – 2 xT y ),

and the inner product for all pairs of points can be obtained through a simple matrix operation.

x = [xl(:)'; yl(:)'; zl(:)'];
IP = x' * x;
d = sqrt(bsxfun(@plus, diag(IP), diag(IP)') - 2 * IP);

For 10000 points, I get the following timing results:

  • ahmad's loop + shoelzer's preallocation: 7.8 seconds
  • Dan's vectorized indices: 5.3 seconds
  • Mohsen's bsxfun: 1.5 seconds
  • my solution: 1.3 seconds


回答2:

You can use bsxfun which is generally a faster solution:

s = [xl(:) yl(:) zl(:)];
d = sqrt(sum(bsxfun(@minus, permute(s, [1 3 2]), permute(s, [3 1 2])).^2,3));


回答3:

You can do this fully vectorized like so:

n = numel(xl);
[X, Y] = meshgrid(1:n,1:n);
Ix = X(:)
Iy = Y(:)
reshape(sqrt((xl(Ix)-xl(Iy)).^2+(yl(Ix)-yl(Iy)).^2+(zl(Ix)-zl(Iy)).^2), n, n);

If you look at Ix and Iy (try it for like a 3x3 dataset), they make every combination of linear indexes possible for each of your matrices. Now you can just do each subtraction in one shot!

However mixing the suggestions of shoelzer and Jost will give you an almost identical performance performance boost:

n = 50;

xl = rand(n,1);
yl = rand(n,1);
zl = rand(n,1);

tic
for t = 1:100
    distanceMatrix = zeros(n); %// Preallocation
    for i=1:n
       for j=min(i+1,n):n %// Taking advantge of symmetry
           distanceMatrix(i,j) = sqrt((xl(i)-xl(j))^2+(yl(i)-yl(j))^2+(zl(i)-zl(j))^2);
       end
    end
    d1 = distanceMatrix + distanceMatrix';           %'
end
toc


%// Vectorized solution that creates linear indices using meshgrid
tic
for t = 1:100
    [X, Y] = meshgrid(1:n,1:n);
    Ix = X(:);
    Iy = Y(:);
    d2 = reshape(sqrt((xl(Ix)-xl(Iy)).^2+(yl(Ix)-yl(Iy)).^2+(zl(Ix)-zl(Iy)).^2), n, n);
end
toc

Returns:

Elapsed time is 0.023332 seconds.
Elapsed time is 0.024454 seconds.

But if I change n to 500 then I get

Elapsed time is 1.227956 seconds.
Elapsed time is 2.030925 seconds.

Which just goes to show that you should always bench mark solutions in Matlab before writing off loops as slow! In this case, depending on the scale of your solution, loops could be significantly faster.



回答4:

Be sure to preallocate distanceMatrix. Your loops will run much, much faster and vectorization probably isn't needed. Even if you do it, there may not be any further speed increase.



回答5:

The latest versions (Since R2016b) of MATLAB support Implicit Broadcasting (See also noted on bsxfun()).

Hence the fastest way for distance matrix is:

function [ mDistMat ] = CalcDistanceMatrix( mA, mB )

mDistMat = sum(mA .^ 2).' - (2 * mA.' * mB) + sum(mB .^ 2);


end

Where the points are along the columns of the set.
In your case mA = mB.

Have a look on my Calculate Distance Matrix Project.