SQL - detecting loops in parent child relations

2019-02-18 11:37发布

问题:

I have parent child data in excel which gets loaded into a 3rd party system running MS SQL server. The data represents a directed (hopefully) acyclic graph. 3rd party means I don't have a completely free hand in the schema. The excel data is a concatenation of other files and the possibility exists that in the cross-references between the various files someone has caused a loop - i.e. X is a child of Y (X->Y) then elsewhere (Y->A->B-X). I can write vb, vba etc on the excel or on the SQL server db. The excel file is almost 30k rows so I'm worried about a combinatorial explosion as the data is set to grow. So some of the techniques like creating a table with all the paths might be pretty unwieldy. I'm thinking of simply writing a program that, for each root, does a tree traversal to each leaf and if the depth gets greater than some nominal value flags it.
Better suggestions or pointers to previous discussion welcomed.

回答1:

You can use a recursive CTE to detect loops:

with prev as (
    select RowId, 1 AS GenerationsRemoved
    from YourTable
    union all
    select RowId, prev.GenerationsRemoved + 1
    from prev
    inner join YourTable on prev.RowId = ParentRowId
    and prev.GenerationsRemoved < 55
)
select * 
from prev
where GenerationsRemoved > 50

This does require you to specify a maximum recursion level: in this case the CTE runs to 55, and it selects as erroneous rows with more than 50 children.