I have a very big file (many gigabytes) which looks like
input.txt
a|textA|2
c|textB|4
b|textC|5
e|textD|1
d|textE|4
b|textF|5
At the first step, I want to sort lines numerically by the third column in descending order, and if lines have the same value of the third column, they must be sorted by the text of the first column – in ascending order. And if lines have equal values for their 1st and 3rd columns, they must be sorted by the 2nd column in ascending order. The second columns are guaranteed to be unique and different.
So, I want the result to be:
desiredOutput.txt
b|textC|5
b|textF|5
c|textB|4
d|textE|4
a|textA|2
e|textD|1
I can take the first step:
sort -t\| -bfrnk3 path/to/input.txt > path/to/output.txt
But what is the next steps? And maybe the result might be achieved in a single pass?
EDIT
I tested sort -t '|' -k 3,3nr -k 1,1 -k 2,2 input.txt > output.txt
. It gives the following "output.txt":
b|textF|5
b|textC|5
c|textB|4
d|textE|4
a|textA|2
e|textD|1
which is not what I want.
You can do it by Sort Command only :-
k3
specifying that sort according to 3rd column and similarlyk1
&k2
according to column 1st & 2nd respectively.n
in3,3nr
means numeric sorting,r
means reverse. Seems like-k 1,1 -k 2,2
is optional as I guesssort
would sort in the ascending order by default.If this is UNIX:
You can use multiple -k flags to sort on more than one column. For example, to sort by 3rd column then 1st column as a tie breaker:
Relevant options from "man sort":
-k, --key=POS1[,POS2]
start a key at POS1, end it at POS2 (origin 1)
POS is F[.C][OPTS], where F is the field number and C the character position in the field. OPTS is one or more single-letter ordering options, which override global ordering options for that key. If no key is given, use the entire line as the key.
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition.