How can I make SQL Server return FALSE for compari

2019-01-18 23:24发布

问题:

If I deliberately store trailing spaces in a VARCHAR column, how can I force SQL Server to see the data as mismatch?

SELECT 'foo' WHERE 'bar' = 'bar    '

I have tried:

SELECT 'foo' WHERE LEN('bar') = LEN('bar    ')

One method I've seen floated is to append a specific character to the end of every string then strip it back out for my presentation... but this seems pretty silly.

Is there a method I've overlooked?

I've noticed that it does not apply to leading spaces so perhaps I run a function which inverts the character order before the compare.... problem is that this makes the query unSARGable....

回答1:

From the docs on LEN (Transact-SQL):

Returns the number of characters of the specified string expression, excluding trailing blanks. To return the number of bytes used to represent an expression, use the DATALENGTH function

Also, from the support page on How SQL Server Compares Strings with Trailing Spaces:

SQL Server follows the ANSI/ISO SQL-92 specification on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them.

Update: I deleted my code using LIKE (which does not pad spaces during comparison) and DATALENGTH() since they are not foolproof for comparing strings

This has also been asked in a lot of other places as well for other solutions:

  • SQL Server 2008 Empty String vs. Space
  • Is it good practice to trim whitespace (leading and trailing)
  • Why would SqlServer select statement select rows which match and rows which match and have trailing spaces


回答2:

Like you said, I don't think there are many options. The only two I could come up with were these:

DECLARE @x nvarchar(50)
DECLARE @y nvarchar(50)
SET @x = 'CAT     '
SET @y = 'CAT'

SELECT 1 WHERE len(@x + '_') = len(@y + '_')

SELECT 1 WHERE reverse(@x) = reverse(@y)

EDIT

Thought of a third:

SELECT 1 WHERE REPLACE(@x, ' ', '_') = REPLACE(@y, ' ', '_')

And a fourth, assuming you're on SQL 2005+

SELECT 1 WHERE QUOTENAME(@x) = QUOTENAME(@y)

Personally, I like the reverse idea the best, but it all depends on which one performs best for you.



回答3:

you could try somethign like this:

declare @a varchar(10), @b varchar(10)
set @a='foo'
set @b='foo   '

select @a, @b, DATALENGTH(@a), DATALENGTH(@b)


回答4:

I've only really got two suggestions. One would be to revisit the design that requires you to store trailing spaces - they're always a pain to deal with in SQL.

The second (given your SARG-able comments) would be to add acomputed column to the table that stores the length, and add this column to appropriate indexes. That way, at least, the length comparison should be SARG-able.



回答5:

After some search the simplest solution i found was in Anthony Bloesch WebLog.

Just add some text (a char is enough) to the end of the data (append)

SELECT 'foo' WHERE 'bar' + 'BOGUS_TXT' = 'bar    ' + 'BOGUS_TXT'

Also works for 'WHERE IN'

SELECT <columnA>
FROM <tableA>
WHERE <columnA> + 'BOGUS_TXT' in ( SELECT <columnB> + 'BOGUS_TXT' FROM <tableB> )


回答6:

The approach I’m planning to use is to use a normal comparison which should be index-keyable (“sargable”) supplemented by a DATALENGTH (because LEN ignores the whitespace). It would look like this:

DECLARE @testValue VARCHAR(MAX) = 'x';

SELECT t.Id, t.Value
FROM dbo.MyTable t
WHERE t.Value = @testValue AND DATALENGTH(t.Value) = DATALENGTH(@testValue)

It is up to the query optimizer to decide the order of filters, but it should choose to use an index for the data lookup if that makes sense for the table being tested and then further filter down the remaining result by length with the more expensive scalar operations. However, as another answer stated, it would be better to avoid these scalar operations altogether by using an indexed calculated column. The method presented here might make sense if you have no control over the schema , or if you want to avoid creating the calculated columns, or if creating and maintaining the calculated columns is considered more costly than the worse query performance.