I have an Nx2 array K1
with the location of N keypoints and a 3 dimensional WxHx3 array Kart1(width,height,coordinates)
that maps coordinates to every pixel of an image. For every keypoint in K1
I want to read the location of the pixel in Kart1
and evaluate the coordinates (search for the min/max or calculate the mean) in a 3x3 kernel around it and assign a value to the current pixel in KPCoor1
.
My current approach looks like this:
for ii=1:length(K1(:,1)) %for every keypoint in K1
MinDist=sqrt(sum(Kart1(K1(ii,2)-1,K1(ii,1)-1,:).^2)); %Calculate distance
xShift=0;
yShift=0;
for kk=-1:1 %for every pixel in a 3x3 kernel...
for ll=-1:1
Distance=sqrt(sum(Kart1(K1(ii,2)+kk,K1(ii,1)+ll,:).^2));
if Distance<MinDist %... if the current distance is smaller than MinDist
MinDist=Distance; %... update MinDist...
xShift=kk; %... and take the kernel coordinate of the pixel
yShift=ll;
end
end
end
KP1Coor(ii,:)=Kart1(K1(ii,2)+xShift,K1(ii,1)+yShift,:); %assign the coordinates of the pixel with the minimal distance in kernel.
end
and it runs, but is ugly and I doubt it's doing what I want to do. I am a bit confused by the "multidimensionality" of the matter, don't know many functions to evaluate kernels, and can't think of a way to use vectorization functions like bsxfun()
or logic operations (means I'm stuck and my brain is dry :/)
Any suggestions on how to eliminate those loops/correct the code?
Vectorized approach: After diving into the vectorized implementation, it looked like a very interesting look-up problem
, so if you are still interested in vectorized techniques to get the job done, here's an approach with bsxfun
-
%// Scalars to replace their repeated usages in the code
[W,H,NC]= size(Kart1);
d3a = 1:NC;
%// Indices for Kart1 at the first level of the nested loops
BI = bsxfun(@plus,K1(:,2)+(K1(:,1)-1)*W,(d3a-1)*W*H);
%// Indices for Kart1 in 3x3 kernel around the indices obtained at first level
BIW3 = bsxfun(@plus,[-1 0 1]',[-W 0 W]); %//'
%// Linear indices for the minimum value of Kart1 in the 3x3 neighborhood
[~,MI3] = min(sqrt(sum(Kart1(bsxfun(@plus,...
BI,permute(BIW3(:),[3 2 1]))).^2,2)),[],3);
%// X-Y indices
[xShift1,yShift1] = ind2sub([3 3],MI3);
%// Get Kart1 values corresponding to the indices for the minimum values
KP1Coor = Kart1(bsxfun(@plus,...
K1(:,2)+xShift1-2 +(K1(:,1)+yShift1-2-1)*W,(d3a-1)*W*H));
Benchmarking
I was able to test it out with GPU too using gpuArray
from Parallel Computing Toolbox and then run some benchmarks with W = 1000
, H = 1000
and used N
as the datasize varying it with [1000 2000 5000 10000 20000]
. The results seem crazy enough, though not unreliable as approved benchmarking methods were used from Measure and Improve GPU Performance
. Here is the benchmark plot for original and CPU and GPU vectorized codes -
It looked appropriate to then benchmark only the vectorized codes and for even larger datasizes, whose plot is shown next -
Conclusions: The problem basically looks like a look-up problem, where Kart1
is the data and K1
is the array of indices to look-up for. The vectorized solution presented here is basically a brute-force approach and the benchmark results clearly favor this for performance. But, it would be interesting to see if any non-brute-force approach that might be even loop-based, but leverages this looking up efficiently could perform better.