I am trying to improve the performance of the OPTICS clustering algorithm. The implementation i've found in open source makes a use of a for loop for each sample and can run for hours...
I believe some use of repmat() function may aid in improving its performance when the system has enough amount of RAM. You are more than welcome to suggest other ways of improving the implementation.
Here is the code:
x is the data: a [mxn] array where m is the sample size and n is the feature dimensionality, which is most of the time significantly greater than one.
[m,n] = size(x);
for i = 1:m
D(i,:) = sum(((repmat(x(i,:),m,1)-x).^2),2).';
end
many thanks.
With enough RAM to play with, you can use few approaches here.
Approach #1: With
bsxfun
&permute
-Approach #2: With
pdist
&squareform
-Approach #3 With
matrix-multiplication based euclidean distance calculations
-For performance, my bet would be on approach #3!