I've used the comm command to compare two files, but I'm unable to pipe it to a third file:
comm file1 file2 > file3
comm: file 1 is not in sorted order
comm: file 2 is not in sorted order
How do I do this? The files are sorted already.
(comm file1 file2 works and prints it out)
sample input:
file1:
21
24
31
36
40
87
105
134
...
file2:
10
21
31
36
40
40
87
103
...
comm file1 file2: works
comm file1 file2 > file3
comm: file 1 is not in sorted order
comm: file 2 is not in sorted order
You've sorted numerically; comm
works on lexically sorted files.
For instance, in file2
, the line 103 is dramatically out of order with the lines 21..87. Your files must be 'plain sort
sorted'.
If you've got bash
(4.x), you can use process substitution:
comm <(sort file1) <(sort file2)
This runs the two commands and ensures that the comm
process gets to read their standard output as if they were files.
Failing that:
(
sort -o file1 file1 &
sort -o file2 file2 &
wait
comm file1 file2
)
This uses parallelism to get the file sorted at the same time. The sub-shell (in ( ... )
) ensures that you don't end up waiting for other background processes to finish.
Your sample data is NOT sorted lexicographically (like in a dictionary), which is what commands like comm
and sort
(without the -n
option) expect, where for example 100 should be before 20.
Are you sure that you aren't simply not noticing the error message when you don't redirect the output, since the error would be intermixed with the output lines on the terminal?
You have to sort the files first with the sort
program.
Try :
sort -o file1 file1
sort -o file2 file2
comm file1 file2 > file3
I don't get the same results as you, but perhaps your version of comm
is complaining that the files are not sorted lexically. Using the input you provided (the ...
makes it interesting, I know it's not a part of your actual files.)
$ comm file[12]
10
21
24
31
36
40
40
87
103
...
105
134
...
I was surprised that ...
wasn't in the third column, so I tried:
$ comm <(sort file1) <(sort file2)
...
10
103
105
134
21
24
31
36
40
40
87
That's better, but 105 > 24, right?
$ comm <(sort -n file1) <(sort -n file2)
...
10
21
24
31
36
40
40
87
103
105
134
I think those were the results you are looking for. The two 40
s are also interesting. If you want to eliminate these:
$ comm <(sort -nu file1) <(sort -nu file2)
...
10
21
24
31
36
40
87
103
105
134
I ran into a similar issue, where comm
was complaining even though I had run sort
. The problem was that I was running Cygwin, and sort
pointed to some MSDOS version (I guess). By using the specific path (C:\Cygwin\bin\sort in my case), it worked.