I have the below SQL Server query:
with cte as
(SELECT DISTINCT refrecid
FROM docuref
WHERE ACTUALCOMPANYID = 'an' and REFTABLEID='78' and typeid='Note')
SELECT docuref.REFRECID, Notes = STUFF
((SELECT CHAR(13) + CHAR(10) + cast([NOTES] AS nvarchar(max))
FROM DOCUREF
WHERE REFRECID = Cte.refrecid AND ACTUALCOMPANYID = 'an' FOR XML PATH(''), TYPE ).value('.', 'nvarchar(max)'), 1, 2, '')
FROM Cte INNER JOIN
DOCUREF ON cte.REFRECID= docuref.REFRECID
WHERE DOCUREF.ACTUALCOMPANYID = 'an' and docuref.REFTABLEID='78' and docuref.typeid='Note'
GROUP BY docuref.REFRECID,cte.refrecid
The Docuref table contains about 40,000 rows. I am trying to combine the Notes column into one record where the RefrecID is the same,
eg if I have below:
Refrecid Recid Notes
1000 2000 Notes1
1000 2001 Notes2
1000 2002 Notes3
I would end up with:
Refrecid Notes
1000 Notes1
Notes2
Notes3
However, this query takes about 2 minutes to run, so need to cut that down a lot, so it takes only a few seconds. I have looked at the Actual Execution Plan and the item taking the most cost says 'Table Valued Function - XML Reader', with a cost of 91%. See below for actual execution plan:
https://www.brentozar.com/pastetheplan/?id=ByJMhcb7B
Is there a better way to do what I am doing?
EDIT:
So based on comments from @Shnugo, I used a Temp table to query against and the query running time has gone down from 2 minutes to 3 seconds. The query I am now using is:
IF OBJECT_ID('tempdb..#TempTable') Is Not null
Drop Table #TempTable
;
WITH grouped AS
(
SELECT dr.refrecid AS REFRECID
FROM docuref dr
WHERE dr.ACTUALCOMPANYID = 'ansa'
and dr.REFTABLEID='78'
and dr.typeid='Note'
GROUP BY dr.refrecid
)
select * into #TempTable from grouped
SELECT gr.REFRECID
,STUFF(
(
SELECT CHAR(13) + CHAR(10) + cast(dr2.[NOTES] AS nvarchar(max))
FROM DOCUREF dr2
WHERE dr2.REFRECID = gr.refrecid
AND dr2.ACTUALCOMPANYID = 'ansa'
FOR XML PATH(''), TYPE
).value('.', 'nvarchar(max)'), 1, 2, '') AS Notes
FROM #TempTable gr;
This is just a quick shot, but it might help:
Read this article by Aaron Bertrand:
"Performance Surprises and Assumptions : GROUP BY vs. DISTINCT"
Change this cte-code
SELECT DISTINCT refrecid
FROM docuref
WHERE ACTUALCOMPANYID = 'an' and REFTABLEID='78' and typeid='Note'
To this
SELECT refrecid
FROM docuref
WHERE ACTUALCOMPANYID = 'an' and REFTABLEID='78' and typeid='Note'
GROUP BY refrecid
Using DISTINCT
the engine will return all rows and filter the final output. Using GROUP BY
will return a single refrecid and continue with this. Your execution plans might tell you (count of estimated and actual rows), if the set was reduced to single ids before or after you are calling the STUFF(sub select with XML)
...
Hint: The best idea was to upgrade to v2017 and use STRING_AGG().
UPDATE: I think this can be put much simpler...
Try it out:
SELECT dr.refrecid AS REFRECID
,STUFF(
(
SELECT CHAR(13) + CHAR(10) + cast(dr2.[NOTES] AS nvarchar(max))
FROM DOCUREF dr2
WHERE dr2.REFRECID = dr.refrecid --- this was cte.refrecid
AND dr2.ACTUALCOMPANYID = 'an'
FOR XML PATH(''), TYPE
).value('.', 'nvarchar(max)'), 1, 2, '') AS Notes
FROM docuref dr
WHERE dr.ACTUALCOMPANYID = 'an'
and dr.REFTABLEID='78'
and dr.typeid='Note'
GROUP BY dr.refrecid;
Or this - but I expect this to be the same...
WITH grouped AS
(
SELECT dr.refrecid AS REFRECID
FROM docuref dr
WHERE dr.ACTUALCOMPANYID = 'an'
and dr.REFTABLEID='78'
and dr.typeid='Note'
GROUP BY dr.refrecid
)
SELECT gr.REFRECID
,STUFF(
(
SELECT CHAR(13) + CHAR(10) + cast(dr2.[NOTES] AS nvarchar(max))
FROM DOCUREF dr2
WHERE dr2.REFRECID = gr.refrecid
AND dr2.ACTUALCOMPANYID = 'an'
FOR XML PATH(''), TYPE
).value('.', 'nvarchar(max)'), 1, 2, '') AS Notes
FROm grouped gr;
UPDATE 2
From your comment I take, that the solution for you was, to use a two-step approach:
- Create a temp-table with the filtered IDs
- Do the expensive action against this reduced list.
Good to know: The query optimizer will not work the query down in the way we might think. This is not a procedural step-by-step process. We tell the engine what we want (a formalised description of the result set) and the engine decides how to do this best.
In this case I'm pretty sure, that the compiler did not rechognize, that the thing within STUFF()
is going to be expensive. Assumably the STUFF()
was done for each row before the WHERE
and the GROUP BY
reduced the rows.
A CTE is not the same as a table (although we can use it within the query in the same way). Especiall with CTEs I'd love a hint like WITH(ForceFirst)
or something like this, which would tell the compiler to create the CTE's set before the rest of the query.
In such cases it is best to force the execution's order with a procedural approach (one-step-after-the-other).
Specify a singleton text node ((./text())[1]
) for the node path. This will help optimize the XML table-valued function in the query plan, which can provide a significant improvement for larger results.
with cte as
(SELECT DISTINCT refrecid
FROM docuref
WHERE ACTUALCOMPANYID = 'an' and REFTABLEID='78' and typeid='Note')
SELECT docuref.REFRECID, Notes = STUFF
((SELECT CHAR(13) + CHAR(10) + cast([NOTES] AS nvarchar(max))
FROM DOCUREF
WHERE REFRECID = Cte.refrecid AND ACTUALCOMPANYID = 'an' FOR XML PATH(''), TYPE ).value('(./text())[1]', 'nvarchar(max)'), 1, 2, '')
FROM Cte INNER JOIN
DOCUREF ON cte.REFRECID= docuref.REFRECID
WHERE DOCUREF.ACTUALCOMPANYID = 'an' and docuref.REFTABLEID='78' and docuref.typeid='Note'
GROUP BY docuref.REFRECID,cte.refrecid