TOP slows down query

2019-02-26 03:29发布

I have a pivot query on a table with millions of rows. Running the query normally, it runs in 2 seconds and returns 2983 rows. If I add TOP 1000 to the query it takes 10 seconds to run.

What could be causing this?

SELECT * 
  FROM (SELECT l.PatientID,
               l.LabID,
               l.Result
          FROM dbo.Labs l
          JOIN (SELECT MAX(LabDate) maxDate, 
                       PatientID, 
                       LabID 
                  FROM dbo.Labs 
              GROUP BY PatientID, LabID) s ON l.PatientID = s.PatientID
                                          AND l.LabID = s.LabID
                                          AND l.LabDate = s.maxDate) A
 PIVOT(MIN(A.Result) FOR A.LabID IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])) p

Execution plans: execution plans

This alternate formulation has the same problem:

select
    * 
FROM (
    SELECT 
        l.PatientID,
        l.LabID,
        l.Result
    FROM dbo.Labs l
    where l.LabDate = (
        select 
            MAX(LabDate) 
        from Labs l2 
        where l2.PatientID = l.PatientID 
            and l2.LabID = l.LabID
    )
) A
PIVOT(MIN(A.Result) FOR A.LabID IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])) p

3条回答
男人必须洒脱
2楼-- · 2019-02-26 04:15
SELECT  TOP 1000
        *
FROM    (
        SELECT  patientId, labId, result,
                DENSE_RANK() OVER (PARTITION BY patientId, labId ORDER BY labDate DESC) dr
        FROM    labs
        ) q
PIVOT   (
        MIN(result)
        FOR
        labId IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])
        ) p
WHERE   dr = 1
ORDER BY
        patientId

You may also try creating an indexed view like this:

CREATE VIEW
        v_labs_patient_lab
WITH SCHEMABINDING
AS
SELECT  patientId, labId, COUNT_BIG(*) AS cnt
FROM    dbo.labs
GROUP BY
        patientId, labId

CREATE UNIQUE CLUSTERED INDEX
        ux_labs_patient_lab
ON      v_labs_patient_lab (patientId, labId)

and use it in the query:

SELECT  TOP 1000
        *
FROM    (
        SELECT  lr.patientId, lr.labId, lr.result
        FROM    v_labs_patient_lab vl
        CROSS APPLY
                (
                SELECT TOP 1 WITH TIES
                       result
                FROM   labs l
                WHERE  l.patientId = vl.patientId
                       AND l.labId = vl.labId
                ORDER BY
                       l.labDate DESC
                ) lr
        ) q
PIVOT   (
        MIN(result)
        FOR
        labId IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])
        ) p
ORDER BY
        patientId
查看更多
做自己的国王
3楼-- · 2019-02-26 04:17

After doing some googling about suggesting an execution plan, I found the solution.

SELECT TOP 1000 * 
FROM (SELECT l.PatientID,
               l.LabID,
               l.Result
          FROM dbo.Labs l
          JOIN (SELECT MAX(LabDate) maxDate, 
                       PatientID, 
                       LabID 
                  FROM dbo.Labs 
              GROUP BY PatientID, LabID) s ON l.PatientID = s.PatientID
                                          AND l.LabID = s.LabID
                                          AND l.LabDate = s.maxDate) A
PIVOT(MIN(A.Result) FOR A.LabID IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])) p
OPTION (HASH JOIN)

OPTION (HASH JOIN) being the thing. The resulting execution plan for the version with TOP looks like the original non-top one, with a TOP tacked on at the end.

Since I was originally doing this in a view what I actually ended up doing was changing JOIN to INNER HASH JOIN

查看更多
Luminary・发光体
4楼-- · 2019-02-26 04:29

There is a specific order in which queries are processed.

A normal SQL query will be written as follows:

SELECT [...]
  FROM [table1]
  JOIN [table2]
    ON [condition]
 WHERE [...]
 GROUP BY [...]
HAVING [...]
 ORDER BY [...]

But the processing order is different:

FROM [table1]
    ON [condition]
  JOIN [table2]
 WHERE [...]
 GROUP BY [...]
HAVING [...]
SELECT [...]
 ORDER BY [...]

When using SELECT DISTINCT [...] or SELECT TOP [...] the processing order will be as follows:

FROM [table1]
    ON [condition]
  JOIN [table2]
 WHERE [...]
 GROUP BY [...]
HAVING [...]
SELECT [...] DISTINCT[...]
ORDER BY [...]
TOP [....]

Hence it's taking longer as your SELECT TOP 1000 is processed last.

Take a look at this link for further details: http://blogs.msdn.com/b/sqlqueryprocessing/

查看更多
登录 后发表回答