可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a pivot query on a table with millions of rows. Running the query normally, it runs in 2 seconds and returns 2983 rows. If I add TOP 1000 to the query it takes 10 seconds to run.
What could be causing this?
SELECT *
FROM (SELECT l.PatientID,
l.LabID,
l.Result
FROM dbo.Labs l
JOIN (SELECT MAX(LabDate) maxDate,
PatientID,
LabID
FROM dbo.Labs
GROUP BY PatientID, LabID) s ON l.PatientID = s.PatientID
AND l.LabID = s.LabID
AND l.LabDate = s.maxDate) A
PIVOT(MIN(A.Result) FOR A.LabID IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])) p
Execution plans:
This alternate formulation has the same problem:
select
*
FROM (
SELECT
l.PatientID,
l.LabID,
l.Result
FROM dbo.Labs l
where l.LabDate = (
select
MAX(LabDate)
from Labs l2
where l2.PatientID = l.PatientID
and l2.LabID = l.LabID
)
) A
PIVOT(MIN(A.Result) FOR A.LabID IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])) p
回答1:
SELECT TOP 1000
*
FROM (
SELECT patientId, labId, result,
DENSE_RANK() OVER (PARTITION BY patientId, labId ORDER BY labDate DESC) dr
FROM labs
) q
PIVOT (
MIN(result)
FOR
labId IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])
) p
WHERE dr = 1
ORDER BY
patientId
You may also try creating an indexed view like this:
CREATE VIEW
v_labs_patient_lab
WITH SCHEMABINDING
AS
SELECT patientId, labId, COUNT_BIG(*) AS cnt
FROM dbo.labs
GROUP BY
patientId, labId
CREATE UNIQUE CLUSTERED INDEX
ux_labs_patient_lab
ON v_labs_patient_lab (patientId, labId)
and use it in the query:
SELECT TOP 1000
*
FROM (
SELECT lr.patientId, lr.labId, lr.result
FROM v_labs_patient_lab vl
CROSS APPLY
(
SELECT TOP 1 WITH TIES
result
FROM labs l
WHERE l.patientId = vl.patientId
AND l.labId = vl.labId
ORDER BY
l.labDate DESC
) lr
) q
PIVOT (
MIN(result)
FOR
labId IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])
) p
ORDER BY
patientId
回答2:
There is a specific order in which queries are processed.
A normal SQL query will be written as follows:
SELECT [...]
FROM [table1]
JOIN [table2]
ON [condition]
WHERE [...]
GROUP BY [...]
HAVING [...]
ORDER BY [...]
But the processing order is different:
FROM [table1]
ON [condition]
JOIN [table2]
WHERE [...]
GROUP BY [...]
HAVING [...]
SELECT [...]
ORDER BY [...]
When using SELECT DISTINCT [...]
or SELECT TOP [...]
the processing order will be as follows:
FROM [table1]
ON [condition]
JOIN [table2]
WHERE [...]
GROUP BY [...]
HAVING [...]
SELECT [...] DISTINCT[...]
ORDER BY [...]
TOP [....]
Hence it's taking longer as your SELECT TOP 1000
is processed last.
Take a look at this link for further details:
http://blogs.msdn.com/b/sqlqueryprocessing/
回答3:
After doing some googling about suggesting an execution plan, I found the solution.
SELECT TOP 1000 *
FROM (SELECT l.PatientID,
l.LabID,
l.Result
FROM dbo.Labs l
JOIN (SELECT MAX(LabDate) maxDate,
PatientID,
LabID
FROM dbo.Labs
GROUP BY PatientID, LabID) s ON l.PatientID = s.PatientID
AND l.LabID = s.LabID
AND l.LabDate = s.maxDate) A
PIVOT(MIN(A.Result) FOR A.LabID IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])) p
OPTION (HASH JOIN)
OPTION (HASH JOIN)
being the thing. The resulting execution plan for the version with TOP looks like the original non-top one, with a TOP tacked on at the end.
Since I was originally doing this in a view what I actually ended up doing was changing JOIN
to INNER HASH JOIN