I have text file which contains hundreds of thousands of records. One of the fields is a date field.
Is there is any way to sort the file based on the date field?
09-APR-12 04.08.43.632279000 AM
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
19-MAR-12 03.54.32.595348000 PM
27-MAR-12 10.28.14.797580000 AM
28-MAR-12 12.28.02.652969000 AM
27-MAR-12 07.28.02.828746000 PM
The Output should come as
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.54.32.595348000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
27-MAR-12 10.28.14.797580000 AM
27-MAR-12 07.28.02.828746000 PM
28-MAR-12 12.28.02.652969000 AM
09-APR-12 04.08.43.632279000 AM
I have tried the sort command to order the date (taking the date field as a string), but it is not giving the correct output.
Chronicle's solution is close, but misses the AM/PM distinction, sorting 27-MAR-12 07.28.02.828746000 PM
before 27-MAR-12 10.28.14.797580000 AM
. This can be modified:
sort -t- -k 3.1,3.2 -k 2M -k 1n -k 3.23,3.24
But that is still very fragile. It would be much better to convert the dates to an epoch time and compare numerically.
Try this :
Input.txt
09-APR-12 04.08.43.632279000 AM
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
19-MAR-12 03.54.32.595348000 PM
27-MAR-12 10.28.14.797580000 AM
28-MAR-12 12.28.02.652969000 AM
27-MAR-12 07.28.02.828746000 PM
Code
sort -t "-" -k 3 -k 2M -nk 1 Input.txt
Output
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.54.32.595348000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
27-MAR-12 07.28.02.828746000 PM
27-MAR-12 10.28.14.797580000 AM
28-MAR-12 12.28.02.652969000 AM
09-APR-12 04.08.43.632279000 AM
This script sorts by Epoch time with nanosecond resolution:
awk '{
t = gensub(/\.([0-9]{2})\./, ":\\1:", 1, $0);
command = "date +%s%N -d \x022" t "\x022";
command | getline t;
close(command);
print t, $0;
}' unsorted.txt | sort -n -k 1 | cut -d ' ' -f 2- > sorted.txt
You could use date, which is generally probably a decent idea, especially if you don't need to worry about the microseconds, otherwise you could probably clip the microseconds off and sort that as a secondary sorting field.
while read a; do
grep "^${a}" input.txt;
done < <(sed 's/\./:/;s/\./:/' input.txt | xargs -n3 -I{} date -d"{}" +%s | sort | xargs -n1 -I{} date -d @'{}' +'%d-%^h-%y %I.%M.%S')