I'm not a SQL expert. If anybody can help me through.
I've recursive CTE to get the values as below.
Child1 --> Parent 1
Parent1 --> Parent 2
Parent2 --> NULL
If data population has gone wrong, then I'll have something like below, because of which CTE may go to infinite recursive loop & gives max recursive error. Since the data is huge, I cannot check this bad data manually. Please let me know if there is a way to find it out.
Child1 --> Parent 1
Parent1 --> Child1
or
Child1 --> Parent 1
Parent1 --> Parent2
Parent2 --> Child1
Try to limit the recursive result
You can use the same approach described by Knuth for detecting a cycle in a linked list here. In one column, keep track of the children, the children's children, the children's children's children, etc. In another column, keep track of the grandchildren, the grandchildren's grandchildren, the grandchildren's grandchildren's grandchildren, etc.
For the initial selection, the distance between
Child
andGrandchild
columns is 1. Every selection fromunion all
increases the depth ofChild
by 1, and that ofGrandchild
by 2. The distance between them increases by 1.If you have any loop, since the distance only increases by 1 each time, at some point after
Child
is in the loop, the distance will be a multiple of the cycle length. When that happens, theChild
and theGrandchild
columns are the same. Use that as an additional condition to stop the recursion, and detect it in the rest of your code as an error.SQL Server sample:
Remove one of the
LinkTable
records that causes the cycle, and you will find that theselect
no longer returns any data.You haven't specified the dialect or your column names, so it is difficult to make the perfect example...
Something like this will show if/where there are loops in the recursive cte. Look at the column
Loop
. With the data as is, there is no loops. In the comments there are examples on how to change the values to cause a loop.In the end the recursive cte creates a
VARCHAR(MAX)
of ids in the form|id1|id2|id3|
(calledParents
) and then checks if the currentID
is already in that "list". If yes, it sets theLoop
column to 1. This column is checked in the recursive join (theABD R.Loop = 0
).The ending query uses a
MAX() OVER (PARTITION BY ...)
to set to 1 theLoop
column for a whole "block" of chains.A little more complex, that generates a "better" report:
This query should return all the "last child" rows, with the full parent chain. The column
Loop
is0
if there is no loop,1
if there is a loop.With Postgres it's quite easy to prevent this by collecting all visited nodes in an array.
Setup:
Recursive query: