When I compare two strings in SQL Server, there are couple of simple ways with =
or LIKE
.
I want to redefine equality as:
If two strings contain the same words - no matter in what order - they are equal, otherwise they are not.
For example:
'my word'
and 'word my'
are equal
'my word'
and 'aaamy word'
are not
What's the best simple solution for this problem?
I don't think there is a simple solution for what you are trying to do in SQL Server. My first thought would be to create a CLR UDF that:
- Accepts two strings
- Breaks them into two arrays using the split function on " "
- Compare the contents of the two arrays, returning true if they contain the same elements.
If this is a route you'd like to go, take a look at this article to get started on creating CLR UDFs.
Try this... The StringSorter function breaks strings on a space and then sorts all the words and puts the string back together in sorted word order.
CREATE FUNCTION dbo.StringSorter(@sep char(1), @s varchar(8000))
RETURNS varchar(8000)
AS
BEGIN
DECLARE @ResultVar varchar(8000);
WITH sorter_cte AS (
SELECT CHARINDEX(@sep, @s) as pos, 0 as lastPos
UNION ALL
SELECT CHARINDEX(@sep, @s, pos + 1), pos
FROM sorter_cte
WHERE pos > 0
)
, step2_cte AS (
SELECT SUBSTRING(@s, lastPos + 1,
case when pos = 0 then 80000
else pos - lastPos -1 end) as chunk
FROM sorter_cte
)
SELECT @ResultVar = (select ' ' + chunk
from step2_cte
order by chunk
FOR XML PATH(''));
RETURN @ResultVar;
END
GO
Here is a test case just trying out the function:
SELECT dbo.StringSorter(' ', 'the quick brown dog jumped over the lazy fox');
which produced these results:
brown dog fox jumped lazy over quick the the
Then to run it from a select statement using your strings
SELECT case when dbo.StringSorter(' ', 'my word') =
dbo.StringSorter(' ', 'word my')
then 'Equal' else 'Not Equal' end as ResultCheck
SELECT case when dbo.StringSorter(' ', 'my word') =
dbo.StringSorter(' ', 'aaamy word')
then 'Equal' else 'Not Equal' end as ResultCheck
The first one shows that they are equal, and the second does not.
This should do exactly what you are looking for with a simple function utilizing a recursive CTE to sort your string.
Enjoy!
There is no simple way to do this. You are advised to write a function or stored procedure that does he processing involved with this requirement.
Your function can use other functions that split the stings into parts, sort by words etc.
Here's how you can split the strings:
T-SQL: Opposite to string concatenation - how to split string into multiple records
Scenario is as follows. You would want to use a TVF to split the first and the second strings on space and then full join
the resulting two tables on values and if you have nulls on left or right you've got inequality otherwise they are equal.
A VERY simple way to do this...
JC65100
ALTER FUNCTION [dbo].[ITS_GetDifCharCount]
(
@str1 VARCHAR(MAX)
,@str2 VARCHAR(MAX)
)
RETURNS INT
AS
BEGIN
DECLARE @result INT
SELECT @result = COUNT(*)
FROM dbo.ITS_CompareStrs(@str1,@str2 )
RETURN @result
END
ALTER FUNCTION [dbo].[ITS_CompareStrs]
(
@str1 VARCHAR(MAX)
,@str2 VARCHAR(MAX)
)
RETURNS
@Result TABLE (ind INT, c1 char(1), c2 char(1))
AS
BEGIN
DECLARE @i AS INT
,@c1 CHAR(1)
,@c2 CHAR(1)
SET @i = 1
WHILE LEN (@str1) > @i-1 OR LEN (@str2) > @i-1
BEGIN
IF LEN (@str1) > @i-1
SET @c1 = substring(@str1, @i, 1)
IF LEN (@str2) > @i-1
SET @c2 = substring(@str2, @i, 1)
INSERT INTO @Result([ind],c1,c2)
SELECT @i,@c1,@c2
SELECT @i=@i+1
,@c1=NULL
,@c2=NULL
END
DELETE FROM @Result
WHERE c1=c2
RETURN
END
You can add a precomputed column in the base table that is evaluated in INSERT/UPDATE trigger (or UDF default) that splits, sorts and then concatenates words from the original column.
Then use = to compare these precomputed columns.
There is library called http://www.sqlsharp.com/ that contains a whole range of useful string/math functions.
It has a function called String_CompareSplitValues which does precisely what you want.
I am not sure if it is in the community version or the paid for version.
declare @s1 varchar(50) = 'my word'
declare @s2 varchar(50) = 'word my'
declare @t1 table (word varchar(50))
while len(@s1)>0
begin
if (CHARINDEX(' ', @s1)>0)
begin
insert into @t1 values(ltrim(rtrim(LEFT(@s1, charindex(' ', @s1)))))
set @s1 = LTRIM(rtrim(right(@s1, len(@s1)-charindex(' ', @s1))))
end
else
begin
insert into @t1 values (@s1)
set @s1=''
end
end
declare @t2 table (word varchar(50))
while len(@s2)>0
begin
if (CHARINDEX(' ', @s2)>0)
begin
insert into @t2 values(ltrim(rtrim(LEFT(@s2, charindex(' ', @s2)))))
set @s2 = LTRIM(rtrim(right(@s2, len(@s2)-charindex(' ', @s2))))
end
else
begin
insert into @t2 values (@s2)
set @s2=''
end
end
select case when exists(SELECT * FROM @t1 EXCEPT SELECT * FROM @t2) then 'are not' else 'are equal' end