I have an Nx2 array K1
with the location of N keypoints and a 3 dimensional WxHx3 array Kart1(width,height,coordinates)
that maps coordinates to every pixel of an image. For every keypoint in K1
I want to read the location of the pixel in Kart1
and evaluate the coordinates (search for the min/max or calculate the mean) in a 3x3 kernel around it and assign a value to the current pixel in KPCoor1
.
My current approach looks like this:
for ii=1:length(K1(:,1)) %for every keypoint in K1
MinDist=sqrt(sum(Kart1(K1(ii,2)-1,K1(ii,1)-1,:).^2)); %Calculate distance
xShift=0;
yShift=0;
for kk=-1:1 %for every pixel in a 3x3 kernel...
for ll=-1:1
Distance=sqrt(sum(Kart1(K1(ii,2)+kk,K1(ii,1)+ll,:).^2));
if Distance<MinDist %... if the current distance is smaller than MinDist
MinDist=Distance; %... update MinDist...
xShift=kk; %... and take the kernel coordinate of the pixel
yShift=ll;
end
end
end
KP1Coor(ii,:)=Kart1(K1(ii,2)+xShift,K1(ii,1)+yShift,:); %assign the coordinates of the pixel with the minimal distance in kernel.
end
and it runs, but is ugly and I doubt it's doing what I want to do. I am a bit confused by the "multidimensionality" of the matter, don't know many functions to evaluate kernels, and can't think of a way to use vectorization functions like bsxfun()
or logic operations (means I'm stuck and my brain is dry :/)
Any suggestions on how to eliminate those loops/correct the code?
Vectorized approach: After diving into the vectorized implementation, it looked like a very interesting
look-up problem
, so if you are still interested in vectorized techniques to get the job done, here's an approach withbsxfun
-Benchmarking
I was able to test it out with GPU too using
gpuArray
from Parallel Computing Toolbox and then run some benchmarks withW = 1000
,H = 1000
and usedN
as the datasize varying it with[1000 2000 5000 10000 20000]
. The results seem crazy enough, though not unreliable as approved benchmarking methods were used fromMeasure and Improve GPU Performance
. Here is the benchmark plot for original and CPU and GPU vectorized codes -It looked appropriate to then benchmark only the vectorized codes and for even larger datasizes, whose plot is shown next -
Conclusions: The problem basically looks like a look-up problem, where
Kart1
is the data andK1
is the array of indices to look-up for. The vectorized solution presented here is basically a brute-force approach and the benchmark results clearly favor this for performance. But, it would be interesting to see if any non-brute-force approach that might be even loop-based, but leverages this looking up efficiently could perform better.