Sql Query Pervious Row Optimisation

2019-01-15 17:40发布

问题:

Here is my table structure

MyTable
-----------

ObjectID int (Identity),           -- Primary Key
FileName varchar(10),
CreatedDate datetime
...........
...........
...........

I need to get the time taken to create record in a file... ie... Time elapsed between the previous record in the same file and the current record of the same file

ie... If the records are

ObjectID    FileName    CreatedDate (just showing the time part here)
--------    --------    -----------
1           ABC         10:23
2           ABC         10:25
3           DEF         10:26
4           ABC         10:30
5           DEF         10:31
6           DEF         10:35

The required output is...

ObjectID    FileName    CreatedDate     PrevRowCreatedDate
--------    --------    ----------- ---------------
1           ABC         10:23           NULL
2           ABC         10:25           10:23
3           DEF         10:26           NULL
4           ABC         10:30           10:25
5           DEF         10:31           10:26
6           DEF         10:35           10:31

So far I got this query, but it is taking a very long time than expected... Is there a better way to do it...

    Select  A.ObjectID, 
        A.FileName
        A.CreatedDate as CreatedDate, 
        B.PrevRowCreatedDate,
        datediff("SS", '1900-01-01 00:00:00', Coalesce((A.CreatedDate - B.PrevRowCreatedDate),0)) as secondsTaken
    from MyTable as A 
        Cross Apply (       
        (Select PrevRowCreatedDate = Max(CreatedDate) from MyTable as BB 
                        where   BB.FileName = A.FileName and 
                                BB.CreatedDate < A.CreatedDate
        )
        ) as B  

Please let me know incase you need more information

Thanks

回答1:

SELECT t1.FileName, t1.CreatedDate, t2.CreatedDate as PrevCreatedDate
FROM 
   (SELECT FileName, CreateDate,
          ROW_NUMBER() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
   FROM MyTable) t1
LEFT JOIN
   (SELECT FileName, CreateDate,
     ROW_NUMBER() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
     FROM MyTable) t2
ON (t1.FileName = t2.FileName AND t1.OrderNo = t2.OrderNo - 1)

Or may be better use 'WITH', because queries is identical:

WITH t(ObjectID, FileName, CreatedDate, OrderNo) AS
   (SELECT ObjectID, FileName, CreatedDate,
          ROW_NUMBER() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
   FROM MyTable) 
SELECT t1.ObjectID, t1.FileName, t1.CreatedDate, t2.CreatedDate AS PrevCreatedDate,
        DATEDIFF("SS", '1900-01-01 00:00:00', 
           COALESCE((t1.CreatedDate - t2.CreatedDate),0)) AS secondsTaken
FROM t t1 LEFT JOIN t t2 
ON (t1.FileName = t2.FileName AND t1.OrderNo = t2.OrderNo + 1)


回答2:

I think Michael's answer should indeed prove more efficient. When evaluating efficiency though I just want to draw attention to an issue with the query costs (relative to the batch) shown in Management Studio.

I set up a test table with 23,174 rows and ran the query in the question and Michael's. Looking at the "query cost (relative to the batch)" in the actual execution plan the original query was 1% and Michael's 99% cost and so appears to be massively inefficient.

However the actual stats tell a completely different story

Cross Apply Approach

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
Table 'MyTable'. Scan count 23175, logical reads 49335, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

ROW_NUMBER approach

SQL Server Execution Times: CPU time = 391 ms, elapsed time = 417 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'MyTable'. Scan count 2, logical reads 148, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

In the ROW_NUMBER plan the Merge Join on rownumber=rownumber+1 has 23,174 rows going in to both sides. This value is unique and actual rows out is 23,174 as well. However SQL Server estimates that the rows produced from that join will be 34,812,000 and thus its estimated cost for the insert later in the plan is wildly inaccurate.

Test Script

BEGIN TRAN

CREATE TABLE MyTable
  (
     [ObjectID]    [INT] IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED,
     [FileName]    [VARCHAR](50) NULL,
     [CreatedDate] [DATETIME] NULL
  )

GO

INSERT INTO MyTable
SELECT ISNULL(type, NEWID()),
       DATEADD(DAY, CAST(RAND(CAST(NEWID() AS VARBINARY)) * 10000 AS INT), GETDATE())
FROM   master.dbo.spt_values,
       (SELECT TOP 10 1 AS X FROM  master.dbo.spt_values) V


DELETE FROM MyTable
WHERE  EXISTS(SELECT *
              FROM   MyTable m2
              WHERE  MyTable.CreatedDate = m2.CreatedDate
                     AND MyTable.FileName = m2.FileName
                     AND MyTable.ObjectID < m2.ObjectID)

CREATE UNIQUE NONCLUSTERED INDEX [IX_MyTable]
  ON MyTable ([FileName] ASC, [CreatedDate] ASC)

SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT A.ObjectID,
       A.FileName,
       A.CreatedDate                                                                                AS CreatedDate,
       B.PrevRowCreatedDate,
       DATEDIFF("SS", '1900-01-01 00:00:00', COALESCE(( A.CreatedDate - B.PrevRowCreatedDate ), 0)) AS secondsTaken
INTO   #A
FROM   MyTable AS A
       CROSS APPLY ((SELECT PrevRowCreatedDate = MAX(CreatedDate)
                     FROM   MyTable AS BB
                     WHERE  BB.FileName = A.FileName
                            AND BB.CreatedDate < A.CreatedDate)) AS B;

WITH t(ObjectID, FileName, CreatedDate, OrderNo)
     AS (SELECT ObjectID,
                FileName,
                CreatedDate,
                RANK() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
         FROM   MyTable)
SELECT t1.ObjectID,
       t1.FileName,
       t1.CreatedDate,
       t2.CreatedDate                                                                          AS PrevCreatedDate,
       DATEDIFF("SS", '1900-01-01 00:00:00', COALESCE(( t1.CreatedDate - t2.CreatedDate ), 0)) AS secondsTaken
INTO   #B
FROM   t t1
       LEFT JOIN t t2
         ON ( t1.FileName = t2.FileName
              AND t1.OrderNo = t2.OrderNo + 1 )

/*Test the 2 queries give the same result*/
SELECT *
FROM   #A
EXCEPT
SELECT *
FROM   #B

SELECT *
FROM   #B
EXCEPT
SELECT *
FROM   #A

ROLLBACK