Remove for loop from clustering algorithm in MATLA

2019-01-28 09:34发布

I am trying to improve the performance of the OPTICS clustering algorithm. The implementation i've found in open source makes a use of a for loop for each sample and can run for hours...

I believe some use of repmat() function may aid in improving its performance when the system has enough amount of RAM. You are more than welcome to suggest other ways of improving the implementation.

Here is the code:

x is the data: a [mxn] array where m is the sample size and n is the feature dimensionality, which is most of the time significantly greater than one.

[m,n] = size(x);

for i = 1:m
    D(i,:) = sum(((repmat(x(i,:),m,1)-x).^2),2).';
end

many thanks.

1条回答
Melony?
2楼-- · 2019-01-28 10:20

With enough RAM to play with, you can use few approaches here.

Approach #1: With bsxfun & permute -

D = squeeze(sum(bsxfun(@minus,permute(x,[3 2 1]),x).^2,2))

Approach #2: With pdist & squareform -

D = squareform(pdist(x).^2)

Approach #3 With matrix-multiplication based euclidean distance calculations -

xt = x.';  %//'
[m,n] = size(x);
D = [x.^2 ones(size(x)) -2*x ]*[ones(size(xt)) ; xt.^2 ; xt];
D(1:m+1:end) = 0;

For performance, my bet would be on approach #3!

查看更多
登录 后发表回答