Numeric sort order for strings containing same-len

2020-06-27 06:06发布

Is it safe to assume that all collations in SQL Server will give the "expected" (i.e. numeric) sort order on strings containing integers of the same length? For example, assuming that @text contains just non-negative integers ([0-9]+), would the following snippet work for ensuring that the value does not overflow the int range, or might there be some collation where @text <= '2147483647' gives unexpected results?

IF LEN(@text) BETWEEN 1 AND 9 
OR LEN(@text) = 10 AND @text <= '2147483647'
-- ...

There won't be cases of variable-length comparisons (e.g. '2' < '11'), so please do not address that issue.

4条回答
够拽才男人
2楼-- · 2020-06-27 06:36
ORDER BY CASE WHEN ISNUMERIC(YourColumn) = 1 THEN Cast(YourColumn as Decimal(38,0)) ELSE null End
查看更多
姐就是有狂的资本
3楼-- · 2020-06-27 06:39

SQL Server collations do not guarantee anything about the encodings. They are mappings from binary representations of characters to the commonly understood characters.

For this purpose, I think the relevant concept is "code page":

Code Page

A code page is an ordered set of characters of a given script in which a numeric index, or code point value, is associated with each character. A Windows code page is commonly referred to as a character set or charset. Code pages are used to provide support for the character sets and keyboard layouts that are used by different Windows system locales. All Windows Server 2008 Unicode collations are Unicode 5.0-based.

The "ordered" piece is the part of interest for this problem. The ordering determines whether the characters are "naturally" ordered for numbers.

So, the concept of collations does not requires that numbers be naturally ordered.

That said, I am not aware of any collation anywhere where the digits '0' - '9' are not ordered naturally. In the Unicode standard, numbers are ordered naturally. I cannot think of a reason why anyone would create such a collation. So, in practice, I would be very, very surprised if such a collation existed. And, if it did, it would probably not be Unicode-compliant and so would not be available in SQL Server.

In practice, no such collation exists or at least no such collation is in wide-spread use. Natural ordering is not guaranteed by the definition of a collation, but is part of the Unicode character sets. You are pretty safe in assuming that all collations have natural ordering of numbers, but it is theoretically possible to create a character set with non-natural ordering of digits.

查看更多
在下西门庆
4楼-- · 2020-06-27 06:51

Like I said in my comment I don't think you are going to find hard documentation that all collations sort numbers the same as that is not a requirement of the collation. But it would be reasonable to assume/expect that. I cannot think of a single use case where I would not expect numbers to sort the same.

Kind of grunge but test them all
The first select just creates the selects
Then copy paste to run the output as selects
There is probably a way to do this in loop I could not get it to use collate to a variable

None failed this simple '54321' < '22222' or N'22222' < N'22221' test for me

SELECT 'SELECT ''' + name + ''' where  ''54321'' < ''22222'' or N''22222'' < N''22221'' COLLATE ' + name FROM sys.fn_helpcollations() WHERE name NOT LIKE 'SQL%';

set nocount on 
SELECT 'Albanian_BIN' where  '54321' < '22222' or N'22222' < N'22221' COLLATE Albanian_BIN
SELECT 'Albanian_BIN2' where  '54321' < '22222' or N'22222' < N'22221' COLLATE Albanian_BIN2
SELECT 'Albanian_CI_AI' where  '54321' < '22222' or N'22222' < N'22221' COLLATE Albanian_CI_AI
SELECT 'Albanian_CI_AI_WS' where  '54321' < '22222' or N'22222' < N'22221' COLLATE Albanian_CI_AI_WS
SELECT 'Albanian_CI_AI_KS' where  '54321' < '22222' or N'22222' < N'22221' COLLATE Albanian_CI_AI_KS
SELECT 'Albanian_CI_AI_KS_WS' where  '54321' < '22222' or N'22222' < N'22221' COLLATE Albanian_CI_AI_KS_WS
SELECT 'Albanian_CI_AS' where  '54321' < '22222' or N'22222' < N'22221' COLLATE Albanian_CI_AS

Response to your comment

Unfortunately, an exhaustive test would be unfeasible, not due to the number of collations, but due to the number of string comparisons that need to be made for each collation. Collations permit groups of characters to be considered together (e.g. “Mac” and “Mc”). Thus, '11' < '22' need not give the same result as '12' < '22'. In my case, I'd need to run 10^10 tests for each collation, as I'm dealing with 10-digit strings.

  • What does "Mac" and "Mc" have and to do with this? The state quesion is "There won't be cases of variable-length comparisons (e.g. '2' < '11'), so please do not address that issue."
  • You feel you need to test all 10 digit strings combinations?
    So if you tested all 4 digit combinations you are afraid two digits in position 7 and 8 of 10 might behave differently? Really if you are that concerned then even if you found documentation that all collations must sort numeric the same that does not mean they implemented properly.
  • If you are that sensitive to accuracy then why is convert to numeric out of scope? I get it your question and you bounty but may ask what the use case?

What I would do is a random order table of all 5 digits and random order table of all 4 digit (so you have odd and even). Have each collation sort and compare to known good sort.
That would run in 4-8 hours on even a mid range server. This question has a 5 day bounty.

查看更多
混吃等死
5楼-- · 2020-06-27 06:54

Is it safe to assume that all collations in SQL Server will give the "expected" (i.e. numeric) sort order on strings containing integers of the same length? = yes

...or might there be some collation where @text <= '2147483647' gives unexpected results? =no

查看更多
登录 后发表回答