How do I pipe comm outputs to a file?

2020-07-29 03:30发布

问题:

I've used the comm command to compare two files, but I'm unable to pipe it to a third file:

comm file1 file2 > file3 

comm: file 1 is not in sorted order
comm: file 2 is not in sorted order

How do I do this? The files are sorted already.

(comm file1 file2 works and prints it out)

sample input:
file1:

21
24
31
36
40
87
105
134
...

file2:

10
21
31
36
40
40
87
103
...

comm file1 file2: works

comm file1 file2 > file3 

comm: file 1 is not in sorted order
comm: file 2 is not in sorted order

回答1:

You've sorted numerically; comm works on lexically sorted files.

For instance, in file2, the line 103 is dramatically out of order with the lines 21..87. Your files must be 'plain sort sorted'.

If you've got bash (4.x), you can use process substitution:

comm <(sort file1) <(sort file2)

This runs the two commands and ensures that the comm process gets to read their standard output as if they were files.

Failing that:

(
sort -o file1 file1 &
sort -o file2 file2 &
wait
comm file1 file2
)

This uses parallelism to get the file sorted at the same time. The sub-shell (in ( ... )) ensures that you don't end up waiting for other background processes to finish.



回答2:

Your sample data is NOT sorted lexicographically (like in a dictionary), which is what commands like comm and sort (without the -n option) expect, where for example 100 should be before 20.

Are you sure that you aren't simply not noticing the error message when you don't redirect the output, since the error would be intermixed with the output lines on the terminal?



回答3:

You have to sort the files first with the sort program.



回答4:

Try :

sort -o file1 file1
sort -o file2 file2
comm file1 file2 > file3


回答5:

I don't get the same results as you, but perhaps your version of comm is complaining that the files are not sorted lexically. Using the input you provided (the ... makes it interesting, I know it's not a part of your actual files.)

$ comm file[12]
        10
                21
24
                31
                36
                40
        40
                87
        103
        ...
105
134
...

I was surprised that ... wasn't in the third column, so I tried:

$ comm <(sort file1) <(sort file2)
                ...
        10
        103
105
134
                21
24
                31
                36
                40
        40
                87

That's better, but 105 > 24, right?

$ comm <(sort -n file1) <(sort -n file2)
                ...
        10
                21
24
                31
                36
                40
        40
                87
        103
105
134

I think those were the results you are looking for. The two 40s are also interesting. If you want to eliminate these:

$ comm <(sort -nu file1) <(sort -nu file2)
                ...
        10
                21
24
                31
                36
                40
                87
        103
105
134


回答6:

I ran into a similar issue, where comm was complaining even though I had run sort. The problem was that I was running Cygwin, and sort pointed to some MSDOS version (I guess). By using the specific path (C:\Cygwin\bin\sort in my case), it worked.



标签: unix