I want to sort lines consisting of dates, but I'm having trouble trying to figure out how to sort the lines and keep the lines whole. I also don't understand how to use pipe to sort the lines.
For example, my script receives this as a text file:
asdsa 24 asdsa 3 3000 054217542 30.3.2016
asdsadsa 25 asdsadsaa 5 4500 534215365 2.1.2014
dsasda 23 dsada 4 3200 537358234 6.3.2016
I would like to read line by line:
while read line; do
done < "$1"
And inside sort the lines by their dates. How can I sort the lines as a they are in a file, while I read them one by one?
What if I do this:
#!/bin/bash
PATH=${PATH[*]}:.
#filename: testScript
while read line; do
arr=( $line )
num_of_params=`echo ${#arr[*]}`
echo $line | sort -n -k$num_of_params
num_of_params=0
done < "$1"
My problem with this is that I actually send each line by its own to sort, and not the lines all together, but I don't know any other other way to do it (without using temp files, I'm not looking to use any of these).
Output:
asdsa 24 asdsa 3 3000 054217542 30.3.2016
asdsadsa 25 asdsadsaa 5 4500 534215365 2.1.2014
dsasda 23 dsada 4 3200 537358234 6.3.2016
Desired output:
asdsadsa 25 asdsadsaa 5 4500 534215365 2.1.2014
dsasda 23 dsada 4 3200 537358234 6.3.2016
asdsa 24 asdsa 3 3000 054217542 30.3.2016
As you can see, it didn't work.
How can I fix that?
Here is a solution using a Schwartzian transform with awk and cut
:
awk '{split($NF,arr,"."); printf("%d%02d%02d\t%s\n",arr[3],arr[2],arr[1],$0)}' infile |
sort -k 1,1 | cut -f 2-
The awk part first splits the last field of the record, $NF
(the date), at the periods into an array arr
:
split($NF,arr,".")
The second part prints the line with the reformatted date prepended: first the year, then the month and the day, the latter two with zero padding to two digits:
printf("%d%02d%02d\t%s\n",arr[3],arr[2],arr[1],$0)
The output of this looks as follows:
20160330 asdsa 24 asdsa 3 3000 054217542 30.3.2016
20140102 asdsadsa 25 asdsadsaa 5 4500 534215365 2.1.2014
20160306 dsasda 23 dsada 4 3200 537358234 6.3.2016
Now we can just pipe to sort
and use the first field:
sort -k 1,1
resulting in
20140102 asdsadsa 25 asdsadsaa 5 4500 534215365 2.1.2014
20160306 dsasda 23 dsada 4 3200 537358234 6.3.2016
20160330 asdsa 24 asdsa 3 3000 054217542 30.3.2016
And finally, we remove our inserted field again with cut
, leaving only everything from the second field on:
cut -f 2-
resulting in
asdsadsa 25 asdsadsaa 5 4500 534215365 2.1.2014
dsasda 23 dsada 4 3200 537358234 6.3.2016
asdsa 24 asdsa 3 3000 054217542 30.3.2016
A Bash solution
If instead of awk we want to use just Bash, we can do this:
#!/bin/bash
# Read each line into an array 'line'
while read -r -a line; do
# Find the number of array elements
nel=${#line[@]}
# Assign the last element of the array to 'date'
date=${line[nel-1]}
# Extract the month from the date with parameter expansion
month=${date#*.}
month=${month%.*}
# Year and day need only one expansion step, which is done here directly
printf "%d%02d%02d\t%s\n" "${date##*.}" "$month" "${date%%.*}" "${line[*]}"
# Pipe result to sort, then remove the first column with cut
done < infile | sort -k 1,1 | cut -f 2-
The general idea is exactly the same: we add an extra column containing the reformatted date, sort by that and then remove it again.
Try
awk -F"[\. ]*" '
{
printf "%d%02d%02d %s\n", $NF, $(NF-1), $(NF-2), $0
}' test | sort -n | cut -c10-
test
is the name of your file of course... It depends on the date being the last part of each line in the format you've specified in your initial post. (Tested on FreeBSD with (n)awk)