I am trying to use comm to compute the difference between two sorted files, however the result doesn't make sense, what's wrong? I want to show the strings that exists in test2 but not test1, and then show the strings that exist in test1 but not test2
>test1
a
b
d
g
>test2
e
g
k
p
>comm test1 test2
a
b
d
e
g
g
k
p
To show the lines that exist in test2
but not in test1
, write either of these:
comm -13 test1 test2
comm -23 test2 test1
(-1
hides the column with lines that exist only in the first file; -2
hides the column with lines that exist only in the second file; -3
hides the column with lines that exist in both files.)
And, vice versa to show the lines that exist in test1
but not in test2
.
Note that g
on a line by itself is considered distinct from g
with a space after it, which is why you get
g
g
instead of
g
Add a character in common between the 2 files, say 'z' at the end. You'll see that a 3rd columns appears, to indicate that that value is common to both.
The output is meant to show 'data in col1 is uniq to file1', while 'data in col2 is unique to file2'.
Finally, arguments to comm '-1, -2, -3' mean suppress output from column numbered supplied, for example, -1.
I hope this helps.