Performance of User-Defined Table Types in SQL Ser

2019-01-24 08:41发布

问题:

We have been using User-Defined Table Types to pass a list of integers to our stored procedures.

We then use these to join to other tables in our stored proc queries.

For example:

CREATE PROCEDURE [dbo].[sp_Name]
(
    @Ids [dbo].[OurTableType] READONLY  
)
AS
    SET Nocount ON

    SELECT
        *
    FROM
        SOMETABLE
        INNER JOIN @Ids [OurTableType] ON [OurTableType].Id = SOMETABLE.Id

We have seen very poor performance from this when using larger datasets.

One approach we've used to speed things up, is the dump the contents into a temp table and join off that instead.

For example:

CREATE PROCEDURE [dbo].[sp_Name]
(
    @Ids [dbo].[OurTableType] READONLY  
)
AS
    SET Nocount ON
    CREATE TABLE #TempTable(Id INT)
    INSERT INTO #TempTable
    SELECT Id from @Ids

    SELECT
        *
    FROM
        SOMETABLE
        INNER JOIN #TempTable ON #TempTable.Id = SOMETABLE.Id

    DROP TABLE #TempTable

This does improve performance significantly, but I wanted to get some opinions on this approach and any other consequences we haven't considered. Also an explanation as to why this improves performance may also be useful.

N.B. sometime we may need to pass in more than just an integer, hence why we don't use a comma separated list or something like that.

回答1:

This topic has been discussed before. The primary reason for the poor performance of the JOIN is that the Table-Valued Parameter (TVP) is a Table Variable. Table Variables do not keep statistics and appear to the Query Optimizer to only have 1 row. Hence they are just fine to do something like INSERT INTO Table (column_list) SELECT column_list FROM @TVP; but not a JOIN.

There are a few things to try to get around this:

  1. Dump everything to a local temporary table (you are already doing this). A technical downside here is that you are duplicating the data passed into the TVP in tempdb (where both the TVP and temp table store their data).

  2. Maybe try defining the User-Defined Table Type to have a Clustered Primary Key. You can do this inline on the [Id] field:

    [ID] INT NOT NULL PRIMARY KEY
    

    Not sure how much this helps performance, but worth a try.

  3. You could try adding OPTION (RECOMPILE) to the query. This is a way of getting the Query Optimizer to see how many rows are in a Table Variable so that it can have proper estimates.

    SELECT column_list
    FROM   SOMETABLE
    INNER JOIN @Ids [OurTableType]
            ON [OurTableType].Id = SOMETABLE.Id
    OPTION (RECOMPILE);
    

    The downside here is that you have a RECOMPILE which takes additional time each time this proc is called. But that might be an overall net gain.

PS. Don't do SELECT *. Always specify a column list. Unless doing something like an IF EXIST(SELECT * FROM)....