How do I sort a tab separated file on the nth colu

2020-07-06 05:53发布

问题:

I have a huge tab separated file which I want to sort on its 2nd column. I need to use the tab character as the field delimiter in cygwin sort. So I need something like this:

sort -t \t -k 2,2 in.txt > out.txt

But the command prompt evaluates '\t' literally and not as the tab character. Note that I need to do this on a Windows machine running Cygwin. Variations such as

sort -t "\t"
sort -t \"\t\"

don't work, neither does putting this in a cmd file with an actual tab in place of the \t above.

Edit: A solution using either the DOS shell or the Cygwin bash shell is fine.

回答1:

On my machine (Mac bash prompt, GNU sort ...) this works:

sort -t '   ' -k 2,2 in.txt > out.txt

(A "real" tab between the quotes.)

To get the tab there I type CTRL-V, TAB (CTRL-V followed by TAB).

EDIT: I've now tested it on a Windows machine from the cygwin prompt and it works the same there (as I expected, bash is bash).



回答2:

You need to add a $ sign in front of the \t to turn on string interpolation, so the tab actually gets sent to sort. This should work in any terminal:

sort -t $'\t' -k 2,2 in.txt > out.txt


回答3:

In Windows Command Prompt, the simplest solution I found is to disable tab-completion first with:

cmd /f:off

Then you can type a literal tab character.



回答4:

I wanted a solution for GnuWin32 sort on Windows but none of the above solutions worked for me on the command line. But the following batch file (.bat) worked which is what I wanted anyway. Type the tab character within the double quotes.

C:>cat foo.bat

sort -k3 -t" " tabfile.txt



回答5:

Anyone see the irony here?
You have to jump through hoops to get the tab character to be a tab...

On Windows command prompt I was able to do it using:
c:\bin\sort -t"(actual tab)" but only after starting the cmd /f:off (as hinted above)

On Windows bat file I was able to do the same thing as long as text editor (notepad2 :-) was set to insert tabs as tabs, not spaces.

There are some well-liked hints to use a -t$'\t" but I tried about 2^16 combinations of this without luck or remaining hair.