T-SQL query performance puzzle: Why does using a v

2019-02-12 05:18发布

问题:

I'm trying to optimize a complex SQL query and getting wildly different results when I make seemingly inconsequential changes.

For example, this takes 336 ms to run:

Declare @InstanceID int set @InstanceID=1;
With myResults as (
    Select 
        Row = Row_Number() Over (Order by sv.LastFirst), 
        ContactID
    From DirectoryContactsByContact(1) sv 
    Join ContainsTable(_s_Contacts, SearchText, 'john') fulltext on (fulltext.[Key]=ContactID)
    Where IsNull(sv.InstanceID,1) = @InstanceID
    and len(sv.LastFirst)>1
) Select * From myResults Where Row between 1 and 20;  

If I replace the @InstanceID with a hard-coded number, it takes over 13 seconds (13890 ms) to run:

Declare @InstanceID int set @InstanceID=1;
With myResults as (
    Select 
        Row = Row_Number() Over (Order by sv.LastFirst), 
        ContactID
    From DirectoryContactsByContact(1) sv 
    Join ContainsTable(_s_Contacts, SearchText, 'john') fulltext on (fulltext.[Key]=ContactID)
    Where IsNull(sv.InstanceID,1) = 1
    and len(sv.LastFirst)>1
) Select * From myResults Where Row between 1 and 20;  

In other cases I get the exact opposite effect: For example, using a variable @s instead of the literal 'john' makes the query run more slowly by an order of magnitude.

Can someone help me tie this together? When does a variable make things faster, and when does it make things slower?

回答1:

The cause might be that IsNull(sv.InstanceID,1) = @InstanceID is very selective for some values of @InstanceID, but not very selective for others. For example, there could be millions of rows with InstanceID = null, so for @InstanceID = 1 a scan might be quicker.

But if you explicitly provide the value of @InstanceID, SQL Server knows based on the table statistics whether it's selective or not.

First, make sure your statistics are up to date:

UPDATE STATISTICS table_or_indexed_view_name 

Then, if the problem still occurs, compare the query execution plan for both methods. You can then enforce the fastest method using query hints.



回答2:

With hardcoded values the optimizer knows what to base on when building execution plan. When you use variables it tries to "guess" the value and in many cases it gets not the best one.

You can help it to pick a value for optimization in 2 ways:

  1. "I know better", this will force it to use the value you provide.

    OPTION (OPTIMIZE FOR(@InstanceID=1))

  2. "See what I do", this will instruct it to sniff the values you pass and use average (or most popular for some data types) value of those supplied over time.

    OPTION (OPTIMIZE FOR UNKNOWN)