I have two files not sortered which have some lines in common.
file1.txt
Z
B
A
H
L
file2.txt
S
L
W
Q
A
The way I'm using to remove common lines is the following:
sort -u file1.txt > file1_sorted.txt
sort -u file2.txt > file2_sorted.txt
comm -23 file1_sorted.txt file2_sorted.txt > file_final.txt
Output:
B
H
Z
The problem is that I want to keep the order of file1.txt, I mean:
Desired output:
Z
B
H
One solution I tought is doing a loop to read all the lines of file2.txt and:
sed -i '/^${line_file2}$/d' file1.txt
But if files are big the performance may suck.
- Do you like my idea?
- Do you have any alternative to do it?
You can use just grep (
-v
for invert,-f
for file). Grep lines frominput1
that do not match any line ininput2
:Gives:
I've written a little Perl script that I use for this kind of thing. It can do more than what you ask for but it can also do what you need:
In your case, you would run it as
The
-f
option makes it compare only the first word (defined by whitespace) of file2 and greatly speeds things up. To compare the entire line, remove the-f
.grep or awk: