I have a script that outputs file paths (via find
), which I want to sort based on very specific custom logic:
1st sort key:
I want the 2nd and, if present, the 3rd -
-separated field to be sorted using custom ordering based on a list of keys I supply - but excluding a numerical suffix.
With the sample input below, the list of keys is:
rp,alpha,beta-ri,beta-rs,RC
2nd sort key: numeric sorting by the trailing number on each line.
Given the following sample input (note that the /foo/bar/test/example/8.2.4.0
prefix of each line is incidental):
/foo/bar/test/example/8.2.4.0-RC10
/foo/bar/test/example/8.2.4.0-RC2
/foo/bar/test/example/8.2.4.0-RC1
/foo/bar/test/example/8.2.4.0-alpha10
/foo/bar/test/example/8.2.4.0-beta-ri10
/foo/bar/test/example/8.2.4.0-beta-ri2
/foo/bar/test/example/8.2.4.0-beta-rs10
/foo/bar/test/example/8.2.4.0-beta-rs2
/foo/bar/test/example/8.2.4.0-alpha2
/foo/bar/test/example/8.2.4.0-rp10
/foo/bar/test/example/8.2.4.0-rp2
I expect:
/foo/bar/test/example/8.2.4.0-rp2
/foo/bar/test/example/8.2.4.0-rp10
/foo/bar/test/example/8.2.4.0-alpha2
/foo/bar/test/example/8.2.4.0-alpha10
/foo/bar/test/example/8.2.4.0-beta-ri2
/foo/bar/test/example/8.2.4.0-beta-ri10
/foo/bar/test/example/8.2.4.0-beta-rs2
/foo/bar/test/example/8.2.4.0-beta-rs10
/foo/bar/test/example/8.2.4.0-RC1
/foo/bar/test/example/8.2.4.0-RC2
/foo/bar/test/example/8.2.4.0-RC10
Using a variant of my answer to your original question:
./your-script | awk -v keysInOrder='rp,alpha,beta-ri,beta-rs,RC' '
BEGIN {
FS=OFS="-"
keyCount = split(keysInOrder, a, ",")
for (i = 1; i <= keyCount; ++i) keysToOrdinal[a[i]] = i
}
{
sortKey = $2
if (NF == 3) sortKey = sortKey FS $3
sub(/[0-9]+$/, "", sortKey)
auxFieldPrefix = "|" FS
if (NF == 2) auxFieldPrefix = auxFieldPrefix FS
sub(/[0-9]/, auxFieldPrefix "&", $NF)
sortOrdinal = sortKey in keysToOrdinal ? keysToOrdinal[sortKey] : keyCount + 1
print sortOrdinal, $0
}
' | sort -t- -k1,1n -k3,3 -k5,5n | sed 's/^[^-]*-//; s/|-\{1,2\}//'
./your-script
represents whatever command produces the output you want to sort.
Note that an aux. character, |
, is used to facilitate sorting, and the assumption is that this character doesn't appear in the input - which should be reasonable safe, given that filesystem paths usually don't contain pipe characters.
Any field 2 values (sans numeric suffix) that aren't in the list of sort keys, sort after the field 2/3 values that are, using alphabetic sorting among them.
While this does not match what the OP is looking for, it would be useful to point out that sort
command has an option -V
for version sorting. And it does the job by following correct order of characters in ASCII table (i.e. UPPERCASE letters first, lowercase letters next)
For example:
cat test.sort.txt
/foo/bar/test/example/8.2.4.0-RC10
/foo/bar/test/example/8.2.4.0-RC2
/foo/bar/test/example/8.2.4.0-RC1
/foo/bar/test/example/8.2.4.0-alpha10
/foo/bar/test/example/8.2.4.0-beta-ri10
/foo/bar/test/example/8.2.4.0-beta-ri2
/foo/bar/test/example/8.2.4.0-beta-rs10
/foo/bar/test/example/8.2.4.0-beta-rs2
/foo/bar/test/example/8.2.4.0-alpha2
/foo/bar/test/example/8.2.4.0-rp10
/foo/bar/test/example/8.2.4.0-rp2
And sorting:
% sort -V test.sort.txt
/foo/bar/test/example/8.2.4.0-RC1
/foo/bar/test/example/8.2.4.0-RC2
/foo/bar/test/example/8.2.4.0-RC10
/foo/bar/test/example/8.2.4.0-alpha2
/foo/bar/test/example/8.2.4.0-alpha10
/foo/bar/test/example/8.2.4.0-beta-ri2
/foo/bar/test/example/8.2.4.0-beta-ri10
/foo/bar/test/example/8.2.4.0-beta-rs2
/foo/bar/test/example/8.2.4.0-beta-rs10
/foo/bar/test/example/8.2.4.0-rp2
/foo/bar/test/example/8.2.4.0-rp10
So, it is useful to be aware of this when giving version names.
With that said, if you insisted, this is one liner that use sed
to enforce sorting:
cat test.sort.txt|sed -e 's/-rp/-x1xrp/;s/-alpha/-x2xalpha/;s/-beta-ri/-x3xbeta-ri/;s/-beta-rs/-x4xbeta-rs/;s/-RC/-x5xRC/'|sort -V|sed -e 's/x.x//'
/foo/bar/test/example/8.2.4.0-rp2
/foo/bar/test/example/8.2.4.0-rp10
/foo/bar/test/example/8.2.4.0-alpha2
/foo/bar/test/example/8.2.4.0-alpha10
/foo/bar/test/example/8.2.4.0-beta-ri2
/foo/bar/test/example/8.2.4.0-beta-ri10
/foo/bar/test/example/8.2.4.0-beta-rs2
/foo/bar/test/example/8.2.4.0-beta-rs10
/foo/bar/test/example/8.2.4.0-RC1
/foo/bar/test/example/8.2.4.0-RC2
/foo/bar/test/example/8.2.4.0-RC10
I found out a solution totally different of what @mklement0 suggests me.
#!/bin/bash
echo "Enter a version :"
read VERSION
while read line;
do
find $line -type d | grep $VERSION | sort -n >> outfile.txt
grep '.*-alpha[0-9]' outfile.txt | sort -n >> outfile2.txt
grep '.*-beta-ri[0-9]' outfile.txt | sort -n >> outfile2.txt
grep '.*-beta-rs[0-9]' outfile.txt | sort -n >> outfile2.txt
grep '.*-RC[0-9]' outfile.txt | sort -n >> outfile2.txt
rm outfile.txt
done <whatever.txt
Content of outfile2.txt :
/foo/bar/test/example/8.2.4.0-alpha10
/foo/bar/test/example/8.2.4.0-alpha8
/foo/bar/test/example/8.2.4.0-alpha9
/foo/bar/test/example/8.2.4.0-beta-ri1
/foo/bar/test/example/8.2.4.0-beta-ri2
/foo/bar/test/example/8.2.4.0-beta-rs1
/foo/bar/test/example/8.2.4.0-beta-rs2
/foo/bar/test/example/8.2.4.0-beta-rs3
/foo/bar/test/example/8.2.4.0-RC1
The only thing wrong with this is that alpha10
came before alpha8
Any clue ?