is there a difference in the order of uniq and sort when calling them in a shell script? i’m talking here about time- and space-wise.
grep 'somePattern' | uniq | sort
vs.
grep 'somePattern' | sort | uniq
a quick test on a 140 k lines textfile showed a slight speed improvement (5.5 s vs 5.0 s) for the first method (get uniq values and then sort)
i don’t know how to measure memory usage though
the question now is: does the order make a difference? or is it dependent on the returned greplines (many/few duplicates)
i’m looking forward to your answers
The only correct order is to call uniq
after sort
, since the man page for uniq
says:
Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).
Therefore it should be
grep 'somePattern' | sort | uniq
I believe that sort -u
is suited to this exact scenario, and will both sort and uniquify things. Obviously, that'll be more efficient than calling sort
and uniq
individually in either order.
uniq depends on the items being sorted to remove duplicates(since it compares the previous and current item), hence why sort is always run before uniq. Try it and see.