I have a tool that generates tests and predicts the output. The idea is that if I have a failure I can compare the prediction to the actual output and see where they diverged. The problem is the actual output contains some lines twice, which confuses diff
. I want to remove the duplicates, so that I can compare them easily. Basically, something like sort -u
but without the sorting.
Is there any unix command line tool that can do this?
Here's what I came up with while I was waiting for an answer here (though the first (and accepted) answer came in about 2 minutes). I used this substitution in
VIM
:Which means: look for lines where after the newline we have the same as before, and replace them only with what we captured in the first line.
uniq
is definitely easier, though.Here is an awk implementation, incase the environment does not have / allow perl (haven't seen one yet)! PS: If there are more than one duplicate lines, then this prints duplicate outputs.
Complementary to the
uniq
answers, which work great if you don't mindsort
ing your file first. If you need to remove non-adjacent lines (or if you want to remove duplicates without rearranging your file), the following Perl one-liner should do it (stolen from here):uniq(1)
SYNOPSIS
DESCRIPTION
Or, if you want to remove non-adjacent duplicate lines as well, this fragment of perl will do it:
If you are interested in removing adjacent duplicate lines, use
uniq
.If you want to remove all duplicate lines, not just adjacent ones, then it's trickier.