Sort ignores an apostrophe - sometimes (except whe

2019-09-20 11:04发布

问题:

This happens to me both on Linux and on cygwin, so I suspect it is not a bug. Still, I don't understand it. Can anyone explain?

Consider the following file (tab-delimited, and that's a regular apostrophe) (I create it with cat to ensure that it wasn't non-printing characters that were the source of the problem)

$cat > temp
cat     1389
cat'    1747
ca't    3175
cat     46848484
ca't    720

$sort temp
<gives the exact same output as cat temp>

$sort -k1,1 temp
cat     1389
cat     46848484
cat'    1747
ca't    3456
ca't    720

Why do I have to ignore the second column in order to sort correctly?

回答1:

I pulled up the manual for sort and noticed the following:

* WARNING * The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.

As it turns out, locales actually specify how lexicographic ordering works for a given locale. This makes a lot of sense, but for some reason it trips over multi field files...

(see also:)
Unusual behaviour of linux's sort command
Why does the sort command sort differently if there are trailing fields?

There are a couple of things you can do:

You can sort naively by byte value using

LC_ALL="C" sort temp

This will give a more logical result, but it might not be the one you actually want.

You could try to get sort to do a more basic lexicographical ordering by setting the locale to C and telling it you want dictionary ordering:

LC_ALL="C" sort -d temp

To have sort output your locale information and hilight the sort key, you can use

sort --debug temp




Personally I'm really curious to know what rule is being specified that makes sort behave unintuitively across multiple fields.

They're supposed to specify correct lexicographic order in the given language and dialect. Do the locales' functions simply not handle the multiple field case at all, or are they taking some kind of different interpretation on the "meaning" of the line?