How to find closest (nearest) value within a vecto

2019-05-14 22:14发布

问题:

I have two equal sized vectors, e.g.

A=[2.29 2.56 2.77 2.90 2.05] and
B=[2.34 2.62 2.67 2.44 2.52].

I am interested to find closest values (almost equal) in two same size vectors A & B, i.e. out of all elements in A, which value is closest to any element of B? The solution should be extendable to any number of (equal size) vectors also. Means able to find closest values with a group of same sized vectors A,B &C. The two resulting values can be from either of two vectors.

For clarity, i am not interested to find closest values within a single vector. The answer from above example is values 2.56 and 2.52.

回答1:

This works for a generic number of vectors of possibly different lengths:

vectors = {[2.29 2.56 2.77 2.90 2.05] [2.34 2.62 2.67 2.44 2.52] [1 2 3 4]}; 
    % Cell array of data vectors; 3 in this example
s = cellfun(@numel, vectors); % Get vector lengths
v = [vectors{:}]; % Concatenate all vectors into a vector
D = abs(bsxfun(@minus, v, v.')); % Compute distances. This gives a matrix.
    % Distances within the same vector will have to be discarded. This will be
    % done by replacing those values with NaN, in blocks
bb = arrayfun(@(x) NaN(x), s, 'uniformoutput', false); % Cell array of blocks
B = blkdiag(bb{:}); % NaN mask with those blocks
[~, ind] = min(D(:) + B(:)); % Add that mask. Get arg min in linear index
[ii, jj] = ind2sub(size(D), ind); % Convert to row and column indices
result = v([ii jj]); % Index into concatenated vector


回答2:

As a starting point for two vectors using bsxfun:

%// data
A = [2.29 2.56 2.77 2.90 2.05]
B = [2.34 2.62 2.67 2.44 2.52]

%// distance matrix 
dist = abs(bsxfun(@minus,A(:),B(:).'));

%// find row and col indices of minimum
[~,idx] = min(dist(:))
[ii,jj] = ind2sub( [numel(A), numel(B)], idx)

%// output 
a = A(ii)
b = B(jj)

now you can put it into a loop etc.


By the way:

dist = abs(bsxfun(@minus,A(:),B(:).'));

would be equivalent to the more obvious:

dist = pdist2( A(:), B(:) )

but I'd rather go for the first solution avoiding the overhead.


And finally the fully vectorized approach for multiple vectors:

%// data
data{1} = [2.29 2.56 2.77 2.90 2.05];
data{2} = [2.34 2.62 2.67 2.44 2.52];
data{3} = [2.34 2.62 2.67 2.44 2.52].*2;
data{4} = [2.34 2.62 2.67 2.44 2.52].*4;
%// length of each vector
N = 5;

%// create Filter for distance matrix
nans(1:numel(data)) = {NaN(N)};
mask = blkdiag(nans{:}) + 1; 

%// create new input for bsxfun
X = [data{:}];

%// filtered distance matrix 
dist = mask.*abs(bsxfun(@minus,X(:),X(:).'));

%// find row and col indices of minimum
[~,idx] = min(dist(:))
[ii,jj] = ind2sub( size(dist), idx)

%// output 
a = X(ii)
b = X(jj)


回答3:

Just as a long comment, if you have access to Statistics and Machine Learning Toolbox, then you could use K-Nearest Neighbors functions which have some pros like:

  1. Handling arrays with different length for example when size(A) = [M, 1] and size(B) = [N, 1]

  2. Handling Two dimensional arrays, for example when size(A) = [M, d] and size(B) = [N, d]

  3. Handling different distance types, for example: Euclidean, City block, Chebychev and so many others and even you own custom distances.

  4. Using KDTree Algorithm for some special case which causes a great performance.

Although in your case answer from "Luis Mendo" seems pretty nice, but it is not extendable as what K-Nearest Neighbors functions from toolbox offer.

Update: A sample Code

% A and B could have any Dimension, just same number of columns (signal Dimension)
A = rand(1000,4);
B = rand(500,4);

% Use any distance you like, some of them are not supported for KDTreeSearcher,
% and you should use ExhaustiveSearcher
myKnnModel= KDTreeSearcher(A, 'Distance', 'minkowski');

% you can ask for many (K) Nearest Neighbors and you always have access to it for later uses
[Idx, D] = knnsearch(myKnnModel, B, 'K',2);

% and this is answer to your special case
[~, idxA] = min(D(:, 1))
idxB = Idx(idxA)