Why is query with phone = N'1234' slower t

2019-03-09 20:06发布

问题:

I have a field which is a varchar(20)

When this query is executed, it is fast (Uses index seek):

SELECT * FROM [dbo].[phone] WHERE phone = '5554474477'

But this one is slow (uses index scan).

SELECT * FROM [dbo].[phone] WHERE phone = N'5554474477'

I am guessing that if I change the field to an nvarchar, then it would use the Index Seek.

回答1:

Because nvarchar has higher datatype precedence than varchar so it needs to perform an implicit cast of the column to nvarchar and this prevents an index seek.

Under some collations it is able to still use a seek and just push the cast into a residual predicate against the rows matched by the seek (rather than needing to do this for every row in the entire table via a scan) but presumably you aren't using such a collation.

The effect of collation on this is illustrated below. When using the SQL collation you get a scan, for the Windows collation it calls the internal function GetRangeThroughConvert and is able to convert it into a seek.

CREATE TABLE [dbo].[phone]
  (
     phone1 VARCHAR(500) COLLATE sql_latin1_general_cp1_ci_as CONSTRAINT uq1 UNIQUE,
     phone2 VARCHAR(500) COLLATE latin1_general_ci_as CONSTRAINT uq2 UNIQUE,
  );

SELECT phone1 FROM [dbo].[phone] WHERE phone1 = N'5554474477';
SELECT phone2 FROM [dbo].[phone] WHERE phone2 = N'5554474477';

The SHOWPLAN_TEXT is below

Query 1

  |--Index Scan(OBJECT:([tempdb].[dbo].[phone].[uq1]),  WHERE:(CONVERT_IMPLICIT(nvarchar(500),[tempdb].[dbo].[phone].[phone1],0)=CONVERT_IMPLICIT(nvarchar(4000),[@1],0)))

Query 2

  |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1005], [Expr1006], [Expr1004]))
       |--Compute Scalar(DEFINE:(([Expr1005],[Expr1006],[Expr1004])=GetRangeThroughConvert([@1],[@1],(62))))
       |    |--Constant Scan
       |--Index Seek(OBJECT:([tempdb].[dbo].[phone].[uq2]), SEEK:([tempdb].[dbo].[phone].[phone2] > [Expr1005] AND [tempdb].[dbo].[phone].[phone2] < [Expr1006]),  WHERE:(CONVERT_IMPLICIT(nvarchar(500),[tempdb].[dbo].[phone].[phone2],0)=[@1]) ORDERED FORWARD)

In the second case the compute scalar emits the following values

Expr1004 = 62
Expr1005 = '5554474477'
Expr1006 = '5554474478'

the seek predicate shown in the plan is on phone2 > Expr1005 and phone2 < Expr1006 so on the face of it would exclude '5554474477' but the flag 62 means that this does match.



回答2:

Other answers already explain what happens; we've seen NVARCHAR has higher type precedence than VARCHAR. I want to explain why the database must cast every row for the column as an NVARCHAR, rather than casting the single supplied value as VARCHAR, even though the second option is clearly much faster, both intuitively and empirically.

Casting from NVARCHAR to VARCHAR is a narrowing conversion. That is, NVARCHAR has potentially more information than a similar VARCHAR value. It's not possible to represent every NVARCHAR input with a VARCHAR output, so casting from the former to the latter potentially loses some information. But the opposite cast is a widening conversion. Casting from a VARCHAR value to an NVARCHAR value never loses information; it's safe.

The idea is when presented with two mismatched types for a comparison, Sql Server should always choose the safe conversion. It's same old "correctness trumps performance" mantra. Or, to paraphrase Benjamin Franklin, "He who would trade essential correctness for a little performance deserve neither correctness nor performance." The type precedence rules, then, are designed to ensure the safe conversions are chosen.

Now you and I both know your narrowing conversion is also safe for this particular data, but the Sql Server query optimizer doesn't care about that. For better or worse, it sees the data type information first when building the execution plan and follows the type precedence rules.

Here's the real kicker: now we're making this cast, we have to do it for every row in the table. This is true even for rows which would not otherwise match the comparison filter. Morever, the cast values from the columns are no longer the same as the values stored in an index, such that any index on the column is now worthless for this query.

I think you're very lucky to be getting an index scan for this query, rather than a full table scan, and it's likely because there is a covering index that meets the needs of the query (the optimizer can choose to cast all the records in the index as easily as all the records in the table).

You can fix things for this query by explicitly resolving the type mismatch in a more favorable way. The best way to accomplish this is, of course, supplying a plain VARCHAR in the first place and avoid any need for casting/conversion at all:

SELECT * FROM [dbo].[phone] WHERE phone = '5554474477'

But I suspect what we're seeing is a value provided by an application, where you don't necessarily control that part of the literal. If so, you can still do this:

SELECT * FROM [dbo].[phone] WHERE phone = cast(N'5554474477' as varchar(20))

Either example favorably resolves the type mismatch from the original code. Even with the latter situation, you may have more control over the literal than you know. For example, if this query was created from a .Net program the problem is possibly related to the AddWithValue() function. I've written about this issue in the past and how to handle it correctly.

These fixes also help demonstrate why things are this way.

It may be possible at some point in the future the Sql Server developers will enhance the query optimizer to look at situations where type precedence rules cause a per-row conversion resulting in a table or index scan, but the opposite conversion involves constant data and could be just an index seek, and in that case first look at the data to see if it would also be safe.

However, I find it unlikely they will ever do this. In my opinion, the corrections to queries within the existing system are too easy relative to the additional performance cost completing the evaluation for individual queries and the complexity in understanding what the optimizer is doing ("Why didn't the server follow the documented precedence rules here?") to justify it.



回答3:

 SELECT * FROM [dbo].[phone] WHERE phone = N'5554474477'

is interpreted as

 SELECT * from [dbo].[phone] WHERE CAST(phone as NVARCHAR) = N'5554474477'

which prevents index usage