The following list does not sort properly (IMHO):
$a = @( 'ABCZ', 'ABC_', 'ABCA' )
$a | sort
ABC_
ABCA
ABCZ
My handy ASCII chart and Unicode C0 Controls and Basic Latin chart
have the underscore (low line) with an ordinal of 95 (U+005F). This is a higher number than the capital letters A-Z. Sort should have put the string ending with an underscore last.
Get-Culture is en-US
The next set of commands does what I expect:
$a = @( 'ABCZ', 'ABC_', 'ABCA' )
[System.Collections.ArrayList] $al = $a
$al.Sort( [System.StringComparer]::Ordinal )
$al
ABCA
ABCZ
ABC_
Now I create an ANSI encoded file containing those same 3 strings:
Get-Content -Encoding Byte data.txt
65 66 67 90 13 10 65 66 67 95 13 10 65 66 67 65 13 10
$a = Get-Content data.txt
[System.Collections.ArrayList] $al = $a
$al.Sort( [System.StringComparer]::Ordinal )
$al
ABC_
ABCA
ABCZ
Once more the string containing the underscore/lowline is not sorted correctly. What am I missing?
Edit:
Let's reference this example #4:
'A' -lt '_'
False
[char] 'A' -lt [char] '_'
True
Seems like both statements should be False or both should be True. I'm comparing strings in the first statement, and then comparing the Char type. A string is merely a collection of Char types so I think the two comparison operations should be equivalent.
And now for example #5:
Get-Content -Encoding Byte data.txt
65 66 67 90 13 10 65 66 67 95 13 10 65 66 67 65 13 10
$a = Get-Content data.txt
$b = @( 'ABCZ', 'ABC_', 'ABCA' )
$a[0] -eq $b[0]; $a[1] -eq $b[1]; $a[2] -eq $b[2];
True
True
True
[System.Collections.ArrayList] $al = $a
[System.Collections.ArrayList] $bl = $b
$al[0] -eq $bl[0]; $al[1] -eq $bl[1]; $al[2] -eq $bl[2];
True
True
True
$al.Sort( [System.StringComparer]::Ordinal )
$bl.Sort( [System.StringComparer]::Ordinal )
$al
ABC_
ABCA
ABCZ
$bl
ABCA
ABCZ
ABC_
The two ArrayList contain the same strings, but are sorted differently. Why?
In many cases PowerShell wrap/unwrap objects in/from PSObject
. In most cases it is done transparently, and you does not even notice this, but in your case it is what cause your trouble.
$a='ABCZ', 'ABC_', 'ABCA'
$a|Set-Content data.txt
$b=Get-Content data.txt
[Type]::GetTypeArray($a).FullName
# System.String
# System.String
# System.String
[Type]::GetTypeArray($b).FullName
# System.Management.Automation.PSObject
# System.Management.Automation.PSObject
# System.Management.Automation.PSObject
As you can see, object returned from Get-Content
are wrapped in PSObject
, that prevent StringComparer
from seeing underlying strings and compare them properly. Strongly typed string collecting can not store PSObject
s, so PowerShell will unwrap strings to store them in strongly typed collection, that allows StringComparer
to see strings and compare them properly.
Edit:
First of all, when you write that $a[1].GetType()
or that $b[1].GetType()
you does not call .NET methods, but PowerShell methods, which normally call .NET methods on wrapped object. Thus you can not get real type of objects this way. Even more, them can be overridden, consider this code:
$c='String'|Add-Member -Type ScriptMethod -Name GetType -Value {[int]} -Force -PassThru
$c.GetType().FullName
# System.Int32
Let us call .NET methods thru reflection:
$GetType=[Object].GetMethod('GetType')
$GetType.Invoke($c,$null).FullName
# System.String
$GetType.Invoke($a[1],$null).FullName
# System.String
$GetType.Invoke($b[1],$null).FullName
# System.String
Now we get real type for $c
, but it says that type of $b[1]
is String
not PSObject
. As I say, in most cases unwrapping done transparently, so you see wrapped String
and not PSObject
itself. One particular case when it does not happening is that: when you pass array, then array elements are not unwrapped. So, let us add additional level of indirection here:
$Invoke=[Reflection.MethodInfo].GetMethod('Invoke',[Type[]]([Object],[Object[]]))
$Invoke.Invoke($GetType,($a[1],$null)).FullName
# System.String
$Invoke.Invoke($GetType,($b[1],$null)).FullName
# System.Management.Automation.PSObject
Now, as we pass $b[1]
as part of array, we can see real type of it: PSObject
. Although, I prefer to use [Type]::GetTypeArray
instead.
About StringComparer
: as you can see, when not both compared objects are strings, then StringComparer
rely on IComparable.CompareTo
for comparison. And PSObject
implement IComparable
interface, so that sorting will be done according to PSObject
IComparable
implementation.
Windows uses Unicode, not ASCII, so what you're seeing is the Unicode sort order for en-US. The general rules for sorting are:
- numbers, then lowercase and uppercase intermixed
- Special characters occur before numbers.
Extending your example,
$a = @( 'ABCZ', 'ABC_', 'ABCA', 'ABC4', 'abca' )
$a | sort-object
ABC_
ABC4
abca
ABCA
ABCZ
If you really want to do this.... I will admit it's ugly but it works. I would create a function if this is something you need to do on a regular basis.
$a = @( 'ABCZ', 'ABC_', 'ABCA', 'ab1z' )
$ascii = @()
foreach ($item in $a)
{
$string = ""
for ($i = 0; $i -lt $item.length; $i++)
{
$char = [int] [char] $item[$i]
$string += "$char;"
}
$ascii += $string
}
$b = @()
foreach ($item in $ascii | Sort-Object)
{
$string = ""
$array = $item.Split(";")
foreach ($char in $array)
{
$string += [char] [int] $char
}
$b += $string
}
$a
$b
ABCA
ABCZ
ABC_
I tried the following and the sort is as expected:
[System.Collections.ArrayList] $al = [String[]] $a