Sorting by date with variable number of columns

2019-09-17 15:31发布

I want to sort lines consisting of dates, but I'm having trouble trying to figure out how to sort the lines and keep the lines whole. I also don't understand how to use pipe to sort the lines.

For example, my script receives this as a text file:

asdsa 24 asdsa 3 3000 054217542 30.3.2016
asdsadsa 25 asdsadsaa 5 4500 534215365 2.1.2014
dsasda 23 dsada 4 3200 537358234 6.3.2016

I would like to read line by line:

while read line; do

done < "$1"

And inside sort the lines by their dates. How can I sort the lines as a they are in a file, while I read them one by one?

What if I do this:

#!/bin/bash

PATH=${PATH[*]}:.
#filename: testScript


while read line; do
    arr=( $line )
    num_of_params=`echo ${#arr[*]}`
    echo $line | sort -n -k$num_of_params

    num_of_params=0
done < "$1"

My problem with this is that I actually send each line by its own to sort, and not the lines all together, but I don't know any other other way to do it (without using temp files, I'm not looking to use any of these).

Output:

asdsa 24 asdsa 3 3000 054217542 30.3.2016
asdsadsa 25 asdsadsaa 5 4500 534215365 2.1.2014
dsasda 23 dsada 4 3200 537358234 6.3.2016

Desired output:

asdsadsa 25 asdsadsaa 5 4500 534215365 2.1.2014
dsasda 23 dsada 4 3200 537358234 6.3.2016
asdsa 24 asdsa 3 3000 054217542 30.3.2016

As you can see, it didn't work.

How can I fix that?

标签: bash shell unix
2条回答
Deceive 欺骗
2楼-- · 2019-09-17 15:44

Try

awk -F"[\. ]*" '
{
   printf "%d%02d%02d %s\n", $NF, $(NF-1), $(NF-2), $0
}' test | sort -n | cut -c10-

test is the name of your file of course... It depends on the date being the last part of each line in the format you've specified in your initial post. (Tested on FreeBSD with (n)awk)

查看更多
再贱就再见
3楼-- · 2019-09-17 16:07

Here is a solution using a Schwartzian transform with awk and cut:

awk '{split($NF,arr,"."); printf("%d%02d%02d\t%s\n",arr[3],arr[2],arr[1],$0)}' infile |
sort -k 1,1 | cut -f 2-

The awk part first splits the last field of the record, $NF (the date), at the periods into an array arr:

split($NF,arr,".")

The second part prints the line with the reformatted date prepended: first the year, then the month and the day, the latter two with zero padding to two digits:

printf("%d%02d%02d\t%s\n",arr[3],arr[2],arr[1],$0)

The output of this looks as follows:

20160330        asdsa 24 asdsa 3 3000 054217542 30.3.2016
20140102        asdsadsa 25 asdsadsaa 5 4500 534215365 2.1.2014
20160306        dsasda 23 dsada 4 3200 537358234 6.3.2016

Now we can just pipe to sort and use the first field:

sort -k 1,1

resulting in

20140102        asdsadsa 25 asdsadsaa 5 4500 534215365 2.1.2014
20160306        dsasda 23 dsada 4 3200 537358234 6.3.2016
20160330        asdsa 24 asdsa 3 3000 054217542 30.3.2016

And finally, we remove our inserted field again with cut, leaving only everything from the second field on:

cut -f 2-

resulting in

asdsadsa 25 asdsadsaa 5 4500 534215365 2.1.2014
dsasda 23 dsada 4 3200 537358234 6.3.2016
asdsa 24 asdsa 3 3000 054217542 30.3.2016

A Bash solution

If instead of awk we want to use just Bash, we can do this:

#!/bin/bash

# Read each line into an array 'line'
while read -r -a line; do

    # Find the number of array elements
    nel=${#line[@]}

    # Assign the last element of the array to 'date'
    date=${line[nel-1]}

    # Extract the month from the date with parameter expansion
    month=${date#*.}
    month=${month%.*}

    # Year and day need only one expansion step, which is done here directly
    printf "%d%02d%02d\t%s\n" "${date##*.}" "$month" "${date%%.*}" "${line[*]}"

# Pipe result to sort, then remove the first column with cut
done < infile | sort -k 1,1 | cut -f 2-

The general idea is exactly the same: we add an extra column containing the reformatted date, sort by that and then remove it again.

查看更多
登录 后发表回答