matlab: remove duplicate values

2019-02-24 07:18发布

问题:

I'm fairly new to programming in general and MATLAB and I'm having some problems with removing values from matrix.

I have matrix tmp2 with values:

tmp2 = [...      ...
        0.6000   20.4000
        0.7000   20.4000
        0.8000   20.4000
        0.9000   20.4000
        1.0000   20.4000
        1.0000   19.1000
        1.1000   19.1000
        1.2000   19.1000
        1.3000   19.1000
        1.4000   19.1000
        ...      ...];

How to remove the part where on the left column there is 1.0 but the values on the right one are different? I want to save the row with 19.1. I searched for solutions but found some that delete both rows using histc function and that's not what i need.

Thanks

回答1:

I saw the solution with unique, and wanted to give a solution with loops. You can take a look to see which one is faster :D! The loop could probably be ameliorated...

clear
tmp = [0.6000   20.4000
        0.7000   20.4000
        0.8000   20.4000
        0.9000   20.4000
        1.0000   20.4000
        1.0000   19.1000
        1.1000   19.1000
        1.2000   19.1000
        1.3000   19.1000
        1.4000   19.1000];

ltmp = length(tmp);
jj = 1;
for ii = 1 : ltmp
    if ii > 1
        if tmp(ii, 1) == tmp(ii - 1, 1)
            continue
        end
    end
    if ii < ltmp
        if tmp(ii, 1) == tmp(ii + 1, 1)
            tmp2(jj,1) = tmp(ii, 1);
            tmp2(jj,2) = min(tmp(ii, 2),tmp(ii + 1, 2));
        else
            tmp2(jj, 1) = tmp(ii, 1);
            tmp2(jj, 2) = tmp(ii, 2);
        end
    else
            tmp2(jj, 1) = tmp(ii, 1);
            tmp2(jj, 2) = tmp(ii, 2);
    end
    jj = jj + 1;
end


回答2:

You can do this using unique:

>> [~,b] = unique(tmp2(:,1)); % indices to unique values in first column of tmp2
>> tmp2(b,:)                  % values at these rows
ans =
    0.6000   20.4000
    0.7000   20.4000
    0.8000   20.4000
    0.9000   20.4000
    1.0000   19.1000
    ...

By default, unique saves the last unique value it finds, and the output will be sorted. This happens to be what you want/have, so you're in luck :)

If this is not what you want/have, you'll have to tinker a bit more. Removing duplicates preserving the order goes like this:

% mess up the order
A = randperm(size(tmp2,1));
tmp2 = tmp2(A,:)

% use third output of unique
[a,b,c] = unique(tmp2(:,1));

% unique values, order preserved
tmp2(b(c),:)

ans =
    1.1000   19.1000
    1.2000   19.1000
    1.0000   20.4000
    0.7000   20.4000
    1.0000   20.4000
    1.4000   19.1000
    0.6000   20.4000
    0.9000   20.4000
    1.3000   19.1000
    0.8000   20.4000
    ...

which still preserves the last entry found. If you want to keep the first entry found, use

% unique values, order preserved, keep first occurrence
[a,b,c] = unique(tmp2(:,1), 'first');


回答3:

use unique without 'rows' option

[C ia ib] = unique( tmp2(:,1) );
C = tmp2( ia, : );