How to remove duplicate lines from a file

I have a tool that generates tests and predicts the output. The idea is that if I have a failure I can compare the prediction to the actual output and see where they diverged. The problem is the actual output contains some lines twice, which confuses diff. I want to remove the duplicates, so that I can compare them easily. Basically, something like sort -u but without the sorting.

Is there any unix command line tool that can do this?

标签： unix command-line duplicates

5条回答

男人必须洒脱

2楼-- · 2019-01-23 04:55

Here's what I came up with while I was waiting for an answer here (though the first (and accepted) answer came in about 2 minutes). I used this substitution in VIM:

%s/^\(.*\)\n\1$/\1/

Which means: look for lines where after the newline we have the same as before, and replace them only with what we captured in the first line.

uniq is definitely easier, though.

0人赞添加讨论(0) 举报

叛逆

3楼-- · 2019-01-23 04:58

Here is an awk implementation, incase the environment does not have / allow perl (haven't seen one yet)! PS: If there are more than one duplicate lines, then this prints duplicate outputs.

awk '{

# Cut out the key on which duplicates are to be determined.
key = substr($0,2,14)

#If the key is not seen before, store in array,else print
if ( ! s[key] )
    s[key] = 1;
else
    print key;
}'

0人赞添加讨论(0) 举报

我想做一个坏孩纸

4楼-- · 2019-01-23 05:04

Complementary to the uniq answers, which work great if you don't mind sorting your file first. If you need to remove non-adjacent lines (or if you want to remove duplicates without rearranging your file), the following Perl one-liner should do it (stolen from here):

cat textfile | perl -ne '$H{$_}++ or print'

0人赞添加讨论(0) 举报

该账号已被封号

5楼-- · 2019-01-23 05:06

uniq(1)

SYNOPSIS

uniq [OPTION]... [INPUT [OUTPUT]]

DESCRIPTION

Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).

Or, if you want to remove non-adjacent duplicate lines as well, this fragment of perl will do it:

while(<>) {
    print $_ if (!$seen{$_});
    $seen{$_}=1;
}

0人赞添加讨论(0) 举报

Ridiculous、

6楼-- · 2019-01-23 05:07

If you are interested in removing adjacent duplicate lines, use uniq.

If you want to remove all duplicate lines, not just adjacent ones, then it's trickier.

0人赞添加讨论(0) 举报

How to remove duplicate lines from a file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间