TSQL - Recursive CTE inefficient - Need an alterna

2020-07-14 10:36发布

问题:

Here is a table with sample data:

DECLARE @TestTable TABLE (
    ItemID INT,
    A INT,
    B INT,
    Month INT)

INSERT INTO @TestTable VALUES (1234, 5, 9, 1)
INSERT INTO @TestTable VALUES (1234, 6, 9, 2)
INSERT INTO @TestTable VALUES (4321, 5, 11, 1)
INSERT INTO @TestTable VALUES (4321, 12, 11, 2)
INSERT INTO @TestTable VALUES (1324, 14, 6, 1)
INSERT INTO @TestTable VALUES (1324, 5, 6, 2)
INSERT INTO @TestTable VALUES (1234, 1, 9, 3)
INSERT INTO @TestTable VALUES (1324, 9, 6, 3)

Something to note is that the B column is always the same as it's only used once in this calculation, but is needed for the initial calculation.

I am attempting to subtract B from A on the first row, then on subsequent rows subtract the previous rows difference from A. Effectively, B - A = C on the first then C - A on all subsequent rows FOR THE RELATED ItemID.

Here are the results I'm expecting:

ItemID  A   B   C   Month   RowNumber
1234    5   9   4   1       1
1234    6   9   -2  2       2
1234    1   9   -3  3       3
1324    14  6   -8  1       1
1324    5   6   -13 2       2
1324    9   6   -22 3       3
4321    5   11  6   1       1
4321    12  11  -6  2       2

Here is how I am accomplishing this.

;WITH CTE_TestValue AS (
    SELECT 
        Main.ItemID,
        Main.A,
        Main.B,
        Main.Month,
        ROW_NUMBER() OVER (Partition BY Main.ItemID ORDER BY Main.Month) AS RowNumber
    FROM @TestTable AS Main
),
CTE_TestColumnC AS (
    SELECT 
        MainA.ItemID,
        MainA.A,
        MainA.B,
        (MainA.B - MainA.A) AS C,
        MainA.Month,
        MainA.RowNumber
    FROM CTE_TestValue AS MainA
        WHERE MainA.Rownumber = 1

    UNION ALL

    SELECT 
        MainB.ItemID,
        MainB.A,
        MainB.B,
        (Sub.C - MainB.A) AS C,
        MainB.Month,
        MainB.RowNumber
    FROM CTE_TestValue AS MainB
        INNER JOIN CTE_TestColumnC AS Sub
            ON MainB.RowNumber - 1 = Sub.RowNumber
            AND MainB.ItemID = Sub.ItemID
--      CROSS JOIN CTE_TestColumnC AS Sub
--          WHERE Sub.RowNumber + 1 = MainB.RowNumber
--          AND MainB.ItemID = Sub.ItemID 
)
SELECT 
    Main.ItemID,
    Main.A,
    Main.B,
    Main.C,
    Main.Month,
    Main.RowNumber
FROM CTE_TestColumnC AS Main
ORDER BY ItemID, Month, RowNumber

This works fine on a small data-sample, but I'm dealing with about 20,000 ItemId's each repeating 10 times. It finishes all the first row calculations instantly, as expected, and then the calculation times go up DRASTICALLY.

As you can see I've tried both an INNER JOIN and a CROSS JOIN. I believe they have the same execution plan with the parameters that I've given the CROSS JOIN.

Is there a more effective/efficient way to accomplish this?

I allowed this to run for 5 hours yesterday to see if it ever ended.. it did not.

Another note: When I'm using this on the test data I SELECT WITHOUT using ORDER to hopefully help speed things along. The ORDER is just for my convenience when I'm fact checking.

回答1:

Your problem is that you are using a CTE as the source of a recursive CTE. Your first CTE will be executed once for each iteration of your recursive CTE. With your test data that means that CTE_TestValue is created 8 times.

Put the result of CTE_TestValue in a temp table that has a clustered primary key on (RowNumber, ItemID) and use that temporary table as the source of data for the recursive CTE CTE_TestColumnC.

Also change the join condition in the recursive part to ON MainB.RowNumber = Sub.RowNumber + 1. That will make the query able to use the index on the temporary table.

DECLARE @TestTable TABLE (
    ItemID INT,
    A INT,
    B INT,
    Month INT)

INSERT INTO @TestTable VALUES (1234, 5, 9, 1)
INSERT INTO @TestTable VALUES (1234, 6, 9, 2)
INSERT INTO @TestTable VALUES (4321, 5, 11, 1)
INSERT INTO @TestTable VALUES (4321, 12, 11, 2)
INSERT INTO @TestTable VALUES (1324, 14, 6, 1)
INSERT INTO @TestTable VALUES (1324, 5, 6, 2)
INSERT INTO @TestTable VALUES (1234, 1, 9, 3)
INSERT INTO @TestTable VALUES (1324, 9, 6, 3)

CREATE TABLE #TestValue
(
  ItemID INT,
  A INT,
  B INT,
  Month INT,
  RowNumber INT,
  primary key(RowNumber, ItemID)
)

INSERT INTO #TestValue
SELECT 
    Main.ItemID,
    Main.A,
    Main.B,
    Main.Month,
    ROW_NUMBER() OVER (Partition BY Main.ItemID ORDER BY Main.Month) AS RowNumber
FROM @TestTable AS Main


;WITH CTE_TestColumnC AS (
    SELECT 
        MainA.ItemID,
        MainA.A,
        MainA.B,
        (MainA.B - MainA.A) AS C,
        MainA.Month,
        MainA.RowNumber
    FROM #TestValue AS MainA
        WHERE MainA.Rownumber = 1

    UNION ALL

    SELECT 
        MainB.ItemID,
        MainB.A,
        MainB.B,
        (Sub.C - MainB.A) AS C,
        MainB.Month,
        MainB.RowNumber
    FROM #TestValue AS MainB
        INNER JOIN CTE_TestColumnC AS Sub
            ON MainB.RowNumber = Sub.RowNumber + 1
            AND MainB.ItemID = Sub.ItemID
)
SELECT 
    Main.ItemID,
    Main.A,
    Main.B,
    Main.C,
    Main.Month,
    Main.RowNumber
FROM CTE_TestColumnC AS Main
ORDER BY ItemID, Month, RowNumber

DROP TABLE #TestValue

In the query plan for your query the problem is shown in the table scan in the lower right corner. with this test data it is executed 8 times with a total of 64 rows returned:

The query plans for the query with a temporary table:



回答2:

I hope I understood correctly what you are trying to do.
Here is my solution:

WITH DATA AS (
SELECT *, row_number() over (ORDER BY itemid) RN
FROM TestTable),
RECURSIVE AS (
   SELECT itemID, B-A AS C, RN
  FROM DATA
  WHERE RN = 1
  UNION ALL
  SELECT T1.itemID, t2.C - t1.A, t1.RN
  FROM DATA AS T1
  INNER JOIN
  RECURSIVE AS T2
  ON t1.RN = T2.Rn+1)
SELECT ItemID, C
FROM RECURSIVE

You can find the full example (with your data) here