Find duplicate entries in an array of strings? [cl

2019-07-18 03:10发布

I have a large cell array of strings in Matlab. I need to find the indexes of duplicate strings in this array. That is, the output I expect is an array of the indices of strings that appear two or more times in the cell array of strings.

How can I do this?

标签: matlab
3条回答
Lonely孤独者°
2楼-- · 2019-07-18 03:41

This can be done with unique:

strings = {'some' 'strings' 'with' 'with' 'duplicate' 'strings' 'strings'};
[~, uniqueIdx] =unique(strings) % Find the indices of the unique strings
duplicates = strings % Copy the original into a duplicate array
duplicates(uniqueIdx) = [] % remove the unique strings, anything left is a duplicate
duplicates = unique(duplicates) % find the unique duplicates
查看更多
叼着烟拽天下
3楼-- · 2019-07-18 03:46

You can order the array, and then check for each cell if it equals the following cell. Runtime = O(N log(N)) I don't recall a built-in function for that.

Arr = ['aa' 'bb' 'cc' 'bb'];
ArrSort = sort(Arr);// Arr = ['aa' 'bb' 'bb' 'cc']

NewArr = ArrSort(1);
newInd = 1;
for i=2:length(ArrSort)
    if NewArr(newInd) ~= ArrSort(i)
       newInd = newInd + 1;
       NewArr(newInd) = ArrSort(i)
    end
end
查看更多
叼着烟拽天下
4楼-- · 2019-07-18 04:02

Another approach: get integer labels using unique, count their ocurrences with histc, and pick those that appear more than once:

str = {'hello' 'bye' 'hi' 'farewell' 'hello' 'morning' 'bye' 'bye'}; %// data
[uniqueStr, ~, ind] = unique(str); %// uniqueStr(ind) equals str
repeatedStr = uniqueStr(histc(ind,1:max(ind))>1); %// result
查看更多
登录 后发表回答