To find infinite recursive loop in CTE

2019-02-18 07:54发布

问题:

I'm not a SQL expert. If anybody can help me through.

I've recursive CTE to get the values as below.

Child1 --> Parent 1

Parent1 --> Parent 2

Parent2 --> NULL

If data population has gone wrong, then I'll have something like below, because of which CTE may go to infinite recursive loop & gives max recursive error. Since the data is huge, I cannot check this bad data manually. Please let me know if there is a way to find it out.

Child1 --> Parent 1

Parent1 --> Child1

or

Child1 --> Parent 1

Parent1 --> Parent2

Parent2 --> Child1

回答1:

You haven't specified the dialect or your column names, so it is difficult to make the perfect example...

-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
    DROP TABLE #MyTable

CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2 
(3, 2, 'SubChild')
-- End random data

;WITH RecursiveCTE (StartingID, Level, Parents, Loop, ID, ParentID, Description) AS
(
    SELECT ID, 1, '|' + CAST(ID AS VARCHAR(MAX)) + '|', 0, * FROM #MyTable
    UNION ALL
    SELECT R.StartingID, R.Level + 1, 
        R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
        CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
        MT.*
        FROM #MyTable MT
        INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)

SELECT StartingID, Level, Parents, MAX(Loop) OVER (PARTITION BY StartingID) Loop, ID, ParentID, Description 
    FROM RecursiveCTE 
    ORDER BY StartingID, Level

Something like this will show if/where there are loops in the recursive cte. Look at the column Loop. With the data as is, there is no loops. In the comments there are examples on how to change the values to cause a loop.

In the end the recursive cte creates a VARCHAR(MAX) of ids in the form |id1|id2|id3| (called Parents) and then checks if the current ID is already in that "list". If yes, it sets the Loop column to 1. This column is checked in the recursive join (the ABD R.Loop = 0).

The ending query uses a MAX() OVER (PARTITION BY ...) to set to 1 the Loop column for a whole "block" of chains.

A little more complex, that generates a "better" report:

-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
    DROP TABLE #MyTable

CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2 
(3, 3, 'SubChild')
-- End random data

-- The "terminal" childrens (that are elements that don't have childrens
-- connected to them)
;WITH WithoutChildren AS
(
    SELECT MT1.* FROM #MyTable MT1
        WHERE NOT EXISTS (SELECT 1 FROM #MyTable MT2 WHERE MT1.ID != MT2.ID AND MT1.ID = MT2.ParentID)
)

, RecursiveCTE (StartingID, Level, Parents, Descriptions, Loop, ParentID) AS
(
    SELECT ID, -- StartingID 
        1, -- Level
        '|' + CAST(ID AS VARCHAR(MAX)) + '|', 
        '|' + CAST(Description AS VARCHAR(MAX)) + '|', 
        0, -- Loop
        ParentID
        FROM WithoutChildren
    UNION ALL
    SELECT R.StartingID, -- StartingID
        R.Level + 1, -- Level
        R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
        R.Descriptions + CAST(MT.Description AS VARCHAR(MAX)) + '|', 
        CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
        MT.ParentID
        FROM #MyTable MT
        INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)

SELECT * FROM RecursiveCTE 
    WHERE ParentID IS NULL OR Loop = 1

This query should return all the "last child" rows, with the full parent chain. The column Loop is 0 if there is no loop, 1 if there is a loop.



回答2:

With Postgres it's quite easy to prevent this by collecting all visited nodes in an array.

Setup:

create table hierarchy (id integer, parent_id integer);

insert into hierarchy
values
(1, null), -- root element
(2, 1), -- first child
(3, 1), -- second child
(4, 3), 
(5, 4), 
(3, 5); -- endless loop

Recursive query:

with recursive tree as (
  select id, 
         parent_id, 
         array[id] as all_parents
  from hierarchy
  where parent_id is null

  union all

  select c.id, 
         c.parent_id,
         p.all_parents||c.id
  from hierarchy c
     join tree p
      on c.parent_id = p.id 
     and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
)
select *
from tree;


回答3:

You can use the same approach described by Knuth for detecting a cycle in a linked list here. In one column, keep track of the children, the children's children, the children's children's children, etc. In another column, keep track of the grandchildren, the grandchildren's grandchildren, the grandchildren's grandchildren's grandchildren, etc.

For the initial selection, the distance between Child and Grandchild columns is 1. Every selection from union all increases the depth of Child by 1, and that of Grandchild by 2. The distance between them increases by 1.

If you have any loop, since the distance only increases by 1 each time, at some point after Child is in the loop, the distance will be a multiple of the cycle length. When that happens, the Child and the Grandchild columns are the same. Use that as an additional condition to stop the recursion, and detect it in the rest of your code as an error.

SQL Server sample:

declare @LinkTable table (Parent int, Child int);
insert into @LinkTable values (1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (7, 1);

with cte as (
    select lt1.Parent, lt1.Child, lt2.Child as Grandchild
    from @LinkTable lt1
    inner join @LinkTable lt2 on lt2.Parent = lt1.Child
    union all
    select cte.Parent, lt1.Child, lt3.Child as Grandchild
    from cte
    inner join @LinkTable lt1 on lt1.Parent = cte.Child
    inner join @LinkTable lt2 on lt2.Parent = cte.Grandchild
    inner join @LinkTable lt3 on lt3.Parent = lt2.Child
    where cte.Child <> cte.Grandchild
)
select Parent, Child
from cte
where Child = Grandchild;

Remove one of the LinkTable records that causes the cycle, and you will find that the select no longer returns any data.



回答4:

Try to limit the recursive result

WITH EMP_CTE AS
( 

    SELECT 
        0 AS [LEVEL],   
        ManagerId, EmployeeId, Name
    FROM Employees
    WHERE ManagerId IS NULL

    UNION ALL

    SELECT 
        [LEVEL] + 1 AS [LEVEL],
        ManagerId, EmployeeId, Name
    FROM Employees e
    INNER JOIN EMP_CTE c ON e.ManagerId = c.EmployeeId 
 AND s.LEVEL < 100 --RECURSION LIMIT
) 

    SELECT  * FROM EMP_CTE WHERE [Level] = 100