We've stumbled upon a very odd problem in one of our applications. The seach engine uses a stored procedure to compare a bunch of filters. However when a specific type of string is inserted, the sql server (2005) behaves very odd. I've isolated the problem to the following:
select 'match!' where 'teliaa' like '%telia%'
The collation is Danish Norwegian CI AS and we have characters that mean the same thing. This includes 'aa' which also means 'å'.
Can anyone explain why the above statement does not yield 'match!'
The collation won't automatically match "aa" to "å".
It will make sure that "å" is sorted correctly and some other stuff but it won't substitute.
The same applies "ss" vs "ß" in German, for example
You'd have to clean the data one way or the other.
SELECT REPLACE ('teliå', 'å', 'aa'), /* ...or */REPLACE ('teliaa', 'aa', 'å')
Edit, May 2013
I'm guessing å
does not match aa
in that collation.
However it does sort correctly
DECLARE @foo TABLE (bar varchar(2))
INSERT @foo VALUES ('Ab'),('Aa'),('aa'), ('å'), ('Za');
SELECT * FROM @foo ORDER BY bar COLLATE Danish_Norwegian_CI_AS;
SELECT * FROM @foo WHERE bar COLLATE Danish_Norwegian_CI_AS = 'Aa';
SELECT * FROM @foo WHERE bar COLLATE Danish_Norwegian_CI_AS = 'a';
SELECT * FROM @foo WHERE bar COLLATE Danish_Norwegian_CI_AS = 'å';
Collation is the collection of rules for comparing characters, most useful for sorting. Some say it only affects sorting, this is not entirely correct. From https://technet.microsoft.com/en-us/library/aa174903%28v=sql.80%29.aspx
A SQL Server collation defines how the database engine stores and operates on character and Unicode data.
As an example, in the Danish_Norwegian accent insensitive collations matches 'aa' with 'å'. Names that begin with 'aa' are sorted alongside names that begin with 'å'. However, it also affects comparisons and the LIKE operator. Here are some query strings illustrating this.
select 'match!' where 'teliaa' collate Latin1_General_100_CI_AI like '%telia%' --yields "match!"
select 'match!' where 'teliaa' collate Latin1_General_100_CI_AS like '%telia%' --yields "match!"
select 'match!' where 'teliaa' collate Latin1_General_100_CS_AI like '%telia%' --yields "match!"
select 'match!' where 'teliaa' collate Latin1_General_100_CS_AS like '%telia%' --yields "match!"
select 'match!' where 'teliaa' collate Danish_Norwegian_CI_AI like '%telia%' --no rows
select 'match!' where 'teliaa' collate Danish_Norwegian_CI_AS like '%telia%' --no rows
select 'match!' where 'teliaa' collate Danish_Norwegian_CS_AI like '%telia%' --no rows
select 'match!' where 'teliaa' collate Danish_Norwegian_CS_AS like '%telia%' --no rows
select 'match!' where 'teliaa' collate Danish_Norwegian_CI_AI like '%å%' --yields "match!"
select 'match!' where 'teliaa' collate Danish_Norwegian_CI_AS like '%å%' --no rows
select 'match!' where 'teliaa' collate Danish_Norwegian_CS_AI like '%å%' --yields "match!"
select 'match!' where 'teliaa' collate Danish_Norwegian_CS_AS like '%å%' --no rows
Depending on your need, if you need to match 'aa' to 'å', treating them the same, then select a collation that supports this in your query or data store/columns. If not, then select a collation like Latin1_General_100_CI_AI.
select 'match!' where 'teliaa' like '%telia%'
gives 'match!' as output on my sql-2008 (SP1)10.0.2531.0(x64)
i can't answer your question...