Can anyone point me towards a sorting algorithm in javascript that would sort the same way SQL Server does (for nvarchar/unicode columns)?
For reference, my previous question about this behavior can be found here: SQL Server 2008 - different sort orders on VARCHAR vs NVARCHAR values
Rather than attempting to change the sorting behavior on the server side, is there a way I can match this on the client side? My previous question specifically talked about dashes in sort orders, but I'm going to assume there's a little more to it than simply ignoring dashes as part of the sort.
I have added some additional use cases here to better demonstrate the issue
Sample data as sorted from SQL Server (2008):
?test
^&$Grails Found
bags of Garbage
Brochures distributed
Calls Received
exhibit visitors
Exhibit Visitors
-Exhibit Visitors
--Exhibit Visitors
Ëxhibit Visitors
Grails Found
How can I get javascript to sort the same values in the same way?
Please let me know if I can further clarify.
@BrockAdams' answer is great, but I had a few edge cases with hyphens in the middle of the string that didn't match up with SQL server, I couldn't quite figure out where it was going wrong, so I wrote a more functional version that just filters out the ignored characters and then compares arrays based on the latin code points.
It's probably less performant, but there's less code to understand and it works on the matches the SQL test cases that I've added below.
I was using a SQL Server database with
Latin1_General_100_CI_AS
, so it was case-insensitive, but I've kept the code here to be case-sensitive, It's easy enough to switch to case-insensitive checking, by creating a wrapper function applyingtoLowerCase
to the variables.There wasn't a difference in the sorting between the two collations with the test cases I had.
I also made a SQL fiddle to double check it. Should the link ever break here's a screenshot of how it looks:
First what is your database collation? I'm going to assume it's
SQL_Latin1_General_CP1_CS_AS
orSQL_Latin1_General_CP1_CI_AS
. If so, then the following should work (not fully tested, yet).It looks like writing a true Unicode sorter is a major undertaking. I've seen tax codes that were more straightforward than the specs. ;-) It always seems to involve lookup table(s) and at least a 3-level sort -- with modifying characters and contractions to account for.
I've limited the following to the Latin 1, Latin Extended-A, and Latin Extended-B tables/collation. The algorithm should work on those sets fairly well but I've not fully tested it nor properly accounted for modifying characters (to save speed and complexity).
See it in action at jsbin.com.
Function:
Test:
Results:
Sorry, JavaScript has no collation features. The only string comparison you get is directly on the UTF-16 code units in a
String
, as returned bycharCodeAt()
.For characters inside the Basic Multilingual Plane, that's the same as a binary collation, so if you need JS and SQL Server to agree (ignoring the astral planes anyway), I think that's the only way you're going to do it. (Short of building a string collator in JS and meticulously copying SQL Server's collation rules, anyway. Not a lot of fun there.)
(What's the use case, why do they need to match?)