matlab find most similar rows in 2 matrices

2019-09-20 16:39发布

问题:

I have 2 matrices

A = [66 1 29.2;
     80 0 29.4;
     80 0 29.4;
     79 1 25.6];

B = [66 1 28.2;
     79 0 28.4;
     66 1 27.6;
     80 0 22.4]

I would like to find the indeces of the matching rows.

indx = [1 1;
        2 4;
        3 2;
        4 3]

idx means that row1 of A matches with row1 of B, row2 of A with row4 of B etc. It should be a pairwise matching (1 row of A with only 1 row of B) For the values in column 2 it should be a strict match. For the values of columns 1 and 3 it should be the best match..(i.e. if it exist a pair with the same values good, otherwise we should pick the closest).

Can you help me? Tnx

EDIT: MORE INSIGHTS ON THE QUESTION DERIVED FROM ANDREW'S COMMENT

row 3 of A cant match row 4 B because row 4 B was already matched with row 2 of A. Row 2 of A matches row 4 of B because the first two elements 80,0 match and then there is a small error in the last element (29.4-22.4=7). We can say that matching properly the 2nd column of A and B is more important than matching the 1st column that is more important than matching the 3rd column. I

回答1:

The problem leaves a lot to the imagination:

  • What is the criterion that two rows are similar? (What metric?)
  • Is it better if a few rows match perfect and a few match ok or if all rows match pretty well?

One way of solving it would be to compute the pairwise distances using pdist2 and then compute a stable matching/marriage based on those distances. This prefers few perfect matches over lots of good matches. You could use an existing implementation of such a matching algorithm. Hanan Kavitz provides one of those on the File Exchange.

% Compute distances
D = pdist2(A, B, 'euclidean');
% Compute preference based on distance
[~,A2B] = sort(D,2); A2B = A2B(:,end:-1:1);
% Compute Matching
J = stableMatching(A2B,A2B.');
matches = [(1:size(A,1)).',J]

This is quite flexible, as you can change the metric of pdist2 according to how you define similarity. There are quite a few metrics already implemented, but you could also provide one of your own.



回答2:

EDIT 2 : SOLUTION

Thanks to the comments provided I managed to come up with a "not elegant" but working solution.

B_rem = B;
weights_error = [2 4 1];
match = zeros(size(A,1),2);

for i = 1 : size(A,1)
    score = zeros(size(B_rem,1),1);
    for j =1 : size(B_rem,1)
        score(j) = sum(abs(A(i,:) - B_rem(j,:)).*weights_error);
    end
    [~,idxmin] = min(score);
    match(i,:) = [i,idxmin];
    B_rem(idxmin,:)=[1000 1000 1000];

end

indx = match;

table_match = zeros(size(A,1),7);
table_match(:,1:3) = A(match(:,1),:);
table_match(:,5:7) = B(match(:,2),:);