Can someone please explain this:
SELECT
CASE WHEN CAST('iX' AS nvarchar(20))
> CAST('-X' AS nvarchar(20)) THEN 1 ELSE 0 END,
CASE WHEN CAST('iX' AS varchar(20))
> CAST('-X' AS varchar(20)) THEN 1 ELSE 0 END
Results: 0 1
SELECT
CASE WHEN CAST('i' AS nvarchar(20))
> CAST('-' AS nvarchar(20)) THEN 1 ELSE 0 END,
CASE WHEN CAST('i' AS varchar(20))
> CAST('-' AS varchar(20)) THEN 1 ELSE 0 END
Results: 1 1
On the first query, the nvarchar()
result is not what I'm expecting, and yet removing the X
make the nvarchar()
sort happen as expected.
(My original queries used the ''
and N''
literal syntax to distinguish varchar()
and nvarchar()
rather than CAST()
and got the same result.)
Collation setting for the database is SQL_Latin1_General_CP1_CI_AS
.
When comparing unicode text, hyphens are treated specially. Unicode comparison uses "dictionary order", which ignores hyphens. This is not the case with non-unicode text comparison.
Comparing -X
and iX
, is like comparing X
and iX
, so -X
, the left side, is greater.
When comparing "-" and "i", is like comparing "" and "i", so "i", the right side is greater.
From MSDN,
A SQL collation's rules for sorting
non-Unicode data are incompatible with
any sort routine that is provided by
the Microsoft Windows operating
system; however, the sorting of
Unicode data is compatible with a
particular version of the Windows
sorting rules. Because the comparison
rules for non-Unicode and Unicode data
are different, when you use a SQL
collation you might see different
results for comparisons of the same
characters, depending on the
underlying data type. For example, if
you are using the SQL collation
"SQL_Latin1_General_CP1_CI_AS", the
non-Unicode string 'a-c' is less than
the string 'ab' because the hyphen
("-") is sorted as a separate
character that comes before "b".
However, if you convert these strings
to Unicode and you perform the same
comparison, the Unicode string N'a-c'
is considered to be greater than N'ab'
because the Unicode sorting rules use
a "word sort" that ignores the hyphen.
SELCT body From MSDN_Articles WHERE url IN
("http://support.microsoft.com/kb/322112")
A nice question!
Digging around, I found that the issue is related to hyphens and apostrophes. Your example exhibits the same 'odd' behaviour with '''X' as with '-X'.
I can't take credit for finding the answer, because it's here:
SQL Sorting and hyphens