问题:

is there a difference in the order of uniq and sort when calling them in a shell script? i’m talking here about time- and space-wise.

grep 'somePattern' | uniq | sort

vs.

grep 'somePattern' | sort | uniq

a quick test on a 140 k lines textfile showed a slight speed improvement (5.5 s vs 5.0 s) for the first method (get uniq values and then sort)

i don’t know how to measure memory usage though

the question now is: does the order make a difference? or is it dependent on the returned greplines (many/few duplicates)

i’m looking forward to your answers

回答1:

The only correct order is to call uniq after sort, since the man page for uniq says:

Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).

Therefore it should be

grep 'somePattern' | sort | uniq

回答2:

I believe that sort -u is suited to this exact scenario, and will both sort and uniquify things. Obviously, that'll be more efficient than calling sort and uniq individually in either order.

回答3:

uniq depends on the items being sorted to remove duplicates(since it compares the previous and current item), hence why sort is always run before uniq. Try it and see.