Reduce cost for Table Valued Function - XML Reader

2019-08-16 02:11发布

问题:

I have the below SQL Server query:

with cte as 
(SELECT DISTINCT refrecid
FROM         docuref
WHERE     ACTUALCOMPANYID = 'an' and REFTABLEID='78' and typeid='Note')
SELECT     docuref.REFRECID,  Notes = STUFF
                            ((SELECT     CHAR(13) + CHAR(10) + cast([NOTES] AS nvarchar(max))
                                FROM         DOCUREF
                                WHERE     REFRECID = Cte.refrecid AND ACTUALCOMPANYID = 'an' FOR XML PATH(''), TYPE ).value('.', 'nvarchar(max)'), 1, 2, '')
 FROM         Cte INNER JOIN
                        DOCUREF ON cte.REFRECID= docuref.REFRECID
 WHERE     DOCUREF.ACTUALCOMPANYID = 'an' and docuref.REFTABLEID='78' and docuref.typeid='Note'
 GROUP BY docuref.REFRECID,cte.refrecid

The Docuref table contains about 40,000 rows. I am trying to combine the Notes column into one record where the RefrecID is the same,

eg if I have below:

Refrecid    Recid     Notes
1000        2000      Notes1
1000        2001      Notes2
1000        2002      Notes3

I would end up with:

Refrecid    Notes
1000        Notes1
            Notes2
            Notes3

However, this query takes about 2 minutes to run, so need to cut that down a lot, so it takes only a few seconds. I have looked at the Actual Execution Plan and the item taking the most cost says 'Table Valued Function - XML Reader', with a cost of 91%. See below for actual execution plan:

https://www.brentozar.com/pastetheplan/?id=ByJMhcb7B

Is there a better way to do what I am doing?

EDIT: So based on comments from @Shnugo, I used a Temp table to query against and the query running time has gone down from 2 minutes to 3 seconds. The query I am now using is:

IF OBJECT_ID('tempdb..#TempTable') Is Not null
Drop Table #TempTable
;
WITH grouped AS
(
    SELECT  dr.refrecid AS REFRECID
    FROM    docuref dr
    WHERE   dr.ACTUALCOMPANYID = 'ansa' 
        and dr.REFTABLEID='78' 
        and dr.typeid='Note'
    GROUP BY dr.refrecid
)

select * into #TempTable from grouped

SELECT gr.REFRECID
    ,STUFF(
    (
        SELECT CHAR(13) + CHAR(10) + cast(dr2.[NOTES] AS nvarchar(max))
        FROM DOCUREF dr2
        WHERE dr2.REFRECID = gr.refrecid 
            AND dr2.ACTUALCOMPANYID = 'ansa' 
        FOR XML PATH(''), TYPE 
    ).value('.', 'nvarchar(max)'), 1, 2, '') AS Notes
FROM #TempTable gr;

回答1:

This is just a quick shot, but it might help:

Read this article by Aaron Bertrand:
"Performance Surprises and Assumptions : GROUP BY vs. DISTINCT"

Change this cte-code

SELECT DISTINCT refrecid
FROM         docuref
WHERE     ACTUALCOMPANYID = 'an' and REFTABLEID='78' and typeid='Note'

To this

SELECT refrecid
FROM   docuref
WHERE  ACTUALCOMPANYID = 'an' and REFTABLEID='78' and typeid='Note'
GROUP BY refrecid

Using DISTINCT the engine will return all rows and filter the final output. Using GROUP BY will return a single refrecid and continue with this. Your execution plans might tell you (count of estimated and actual rows), if the set was reduced to single ids before or after you are calling the STUFF(sub select with XML)...

Hint: The best idea was to upgrade to v2017 and use STRING_AGG().

UPDATE: I think this can be put much simpler...

Try it out:

SELECT  dr.refrecid AS REFRECID
       ,STUFF(
       (
           SELECT CHAR(13) + CHAR(10) + cast(dr2.[NOTES] AS nvarchar(max))
           FROM DOCUREF dr2
           WHERE dr2.REFRECID = dr.refrecid --- this was cte.refrecid
             AND dr2.ACTUALCOMPANYID = 'an' 
           FOR XML PATH(''), TYPE 
       ).value('.', 'nvarchar(max)'), 1, 2, '') AS Notes
FROM    docuref dr
WHERE   dr.ACTUALCOMPANYID = 'an' 
    and dr.REFTABLEID='78' 
    and dr.typeid='Note'
GROUP BY dr.refrecid;

Or this - but I expect this to be the same...

WITH grouped AS
(
    SELECT  dr.refrecid AS REFRECID
    FROM    docuref dr
    WHERE   dr.ACTUALCOMPANYID = 'an' 
        and dr.REFTABLEID='78' 
        and dr.typeid='Note'
    GROUP BY dr.refrecid
)
SELECT gr.REFRECID
        ,STUFF(
        (
            SELECT CHAR(13) + CHAR(10) + cast(dr2.[NOTES] AS nvarchar(max))
            FROM DOCUREF dr2
            WHERE dr2.REFRECID = gr.refrecid 
                AND dr2.ACTUALCOMPANYID = 'an' 
            FOR XML PATH(''), TYPE 
        ).value('.', 'nvarchar(max)'), 1, 2, '') AS Notes
FROm grouped gr;

UPDATE 2

From your comment I take, that the solution for you was, to use a two-step approach:

  1. Create a temp-table with the filtered IDs
  2. Do the expensive action against this reduced list.

Good to know: The query optimizer will not work the query down in the way we might think. This is not a procedural step-by-step process. We tell the engine what we want (a formalised description of the result set) and the engine decides how to do this best.

In this case I'm pretty sure, that the compiler did not rechognize, that the thing within STUFF() is going to be expensive. Assumably the STUFF() was done for each row before the WHERE and the GROUP BY reduced the rows.

A CTE is not the same as a table (although we can use it within the query in the same way). Especiall with CTEs I'd love a hint like WITH(ForceFirst) or something like this, which would tell the compiler to create the CTE's set before the rest of the query.

In such cases it is best to force the execution's order with a procedural approach (one-step-after-the-other).



回答2:

Specify a singleton text node ((./text())[1]) for the node path. This will help optimize the XML table-valued function in the query plan, which can provide a significant improvement for larger results.

with cte as 
(SELECT DISTINCT refrecid
FROM         docuref
WHERE     ACTUALCOMPANYID = 'an' and REFTABLEID='78' and typeid='Note')
SELECT     docuref.REFRECID,  Notes = STUFF
                            ((SELECT     CHAR(13) + CHAR(10) + cast([NOTES] AS nvarchar(max))
                                FROM         DOCUREF
                                WHERE     REFRECID = Cte.refrecid AND ACTUALCOMPANYID = 'an' FOR XML PATH(''), TYPE ).value('(./text())[1]', 'nvarchar(max)'), 1, 2, '')
 FROM         Cte INNER JOIN
                        DOCUREF ON cte.REFRECID= docuref.REFRECID
 WHERE     DOCUREF.ACTUALCOMPANYID = 'an' and docuref.REFTABLEID='78' and docuref.typeid='Note'
 GROUP BY docuref.REFRECID,cte.refrecid