Sorting a tab delimited file

2019-01-04 16:53发布

I have a data with the following format:

foo<tab>1.00<space>1.33<space>2.00<tab>3

Now I tried to sort the file based on the last field decreasingly. I tried the following commands but it wasn't sorted as we expected.

$ sort -k3nr file.txt  # apparently this sort by space as delimiter

$ sort -t"\t" -k3nr file.txt
  sort: multi-character tab `\\t'

$ sort -t "`/bin/echo '\t'`" -k3,3nr file.txt
  sort: multi-character tab `\\t'

What's the right way to do it?

Here is the sample data.

9条回答
ら.Afraid
2楼-- · 2019-01-04 16:58

In general keeping data like this is not a great thing to do if you can avoid it, because people are always confusing tabs and spaces.

Solving your problem is very straightforward in a scripting language like Perl, Python or Ruby. Here's some example code:

#!/usr/bin/perl -w

use strict;

my $sort_field = 2;
my $split_regex = qr{\s+};

my @data;
push @data, "7 8\t 9";
push @data, "4 5\t 6";
push @data, "1 2\t 3";

my @sorted_data = 
    map  { $_->[1] }
    sort { $a->[0] <=> $b->[0] }
    map  { [ ( split $split_regex, $_ )[$sort_field], $_ ] }
    @data;

print "unsorted\n";
print join "\n", @data, "\n";
print "sorted by $sort_field, lines split by $split_regex\n";
print join "\n", @sorted_data, "\n";
查看更多
我命由我不由天
3楼-- · 2019-01-04 16:58

If you want to make it easier for yourself by only having tabs, replace the spaces with tabs:

tr " " "\t" < <file> | sort <options>
查看更多
三岁会撩人
4楼-- · 2019-01-04 17:02

pipe it through something like awk '{ print print $1"\t"$2"\t"$3"\t"$4"\t"$5 }'. This will change the spaces to tabs.

查看更多
Ridiculous、
5楼-- · 2019-01-04 17:06

By default the field delimiter is non-blank to blank transition so tab should work just fine.

However, the columns are indexed base 1 and base 0 so you probably want

sort -k4nr file.txt

to sort file.txt by column 4 numerically in reverse order. (Though the data in the question has even 5 fields so the last field would be index 5.)

查看更多
姐就是有狂的资本
6楼-- · 2019-01-04 17:08

I was having this problem with sort in cygwin in a bash shell when using 'general-numeric-sort'. If I specified -t$'\t' -kFg, where F is the field number, it didn't work, but when I specified both -t$'\t' and -kF,Fg (e.g -k7,7g for the 7th field) it did work. -kF,Fg without the -t$'\t' did not work.

查看更多
smile是对你的礼貌
7楼-- · 2019-01-04 17:09

Using bash, this will do the trick:

$ sort -t$'\t' -k3 -nr file.txt

Notice the dollar sign in front of the single-quoted string. You can read about it in the ANSI-C Quoting sections of the bash man page.

查看更多
登录 后发表回答