I notice the following sort outputs. Who understands why the '.' gets sorted in front the first time and at the end the second time?
I was trying to debug a program which looks up lines in a large sorted file, but the culprit seems to be my expectation/understanding of linux sort.
$ sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
/mnt/x/E
/mnt/x/.
<ctrl-D>
/mnt/x/.
________
/mnt/x/E
________
$ sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
/mnt/x/Ed
/mnt/x/.T
<ctrl-D>
/mnt/x/Ed
_________
/mnt/x/.T
_________
$
It's not that "." comes before or after other characters - it's that it's not being examined at all; it's sorting purely based on the alphabetic characters.
In your first example, <end-of-string>
sorts before E
; in the second example, E
sorts before T
.
This behaviour is dependent on the locale settings for collation. You can influence this with environment variables, such as LC_COLLATE
:
$ env LC_COLLATE=C sort
/mnt/x/Ed
/mnt/x/.T
^D
/mnt/x/.T
/mnt/x/Ed
$ env LC_COLLATE=en_US.UTF-8 sort
/mnt/x/Ed
/mnt/x/.T
^D
/mnt/x/Ed
/mnt/x/.T
$
Under the C
locale, all ASCII characters are considered, and are sorted in their ASCII order; in many other locales punctuation is ignored - this is presumably what is causing the behaviour you're seeing.
You can examine your locale settings using the locale
command.