可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Consider the following simple DAG:
1->2->3->4
And a table, #bar, describing this (I'm using SQL Server 2005):
parent_id child_id
1 2
2 3
3 4
//... other edges, not connected to the subgraph above
Now imagine that I have some other arbitrary criteria that select the first and last edges, i.e. 1->2 and 3->4. I want to use these to find the rest of my graph.
I can write a recursive CTE as follows (I'm using terminology from MSDN):
with foo(parent_id,child_id) as (
// anchor member that happens to select first and last edges:
select parent_id,child_id from #bar where parent_id in (1,3)
union all
// recursive member:
select #bar.* from #bar
join foo on #bar.parent_id = foo.child_id
)
select parent_id,child_id from foo
However, this results in edge 3->4 being selected twice:
parent_id child_id
1 2
3 4
2 3
3 4 // 2nd appearance!
How can I prevent the query from recursing into subgraphs that have already been described? I could achieve this if, in my "recursive member" part of the query, I could reference all data that has been retrieved by the recursive CTE so far (and supply a predicate indicating in the recursive member excluding nodes already visited). However, I think I can access data that was returned by the last iteration of the recursive member only.
This will not scale well when there is a lot of such repetition. Is there a way of preventing this unnecessary additional recursion?
Note that I could use "select distinct" in the last line of my statement to achieve the desired results, but this seems to be applied after all the (repeated) recursion is done, so I don't think this is an ideal solution.
Edit - hainstech suggests stopping recursion by adding a predicate to exclude recursing down paths that were explicitly in the starting set, i.e. recurse only where foo.child_id not in (1,3)
. That works for the case above only because it simple - all the repeated sections begin within the anchor set of nodes. It doesn't solve the general case where they may not be. e.g., consider adding edges 1->4 and 4->5 to the above set. Edge 4->5 will be captured twice, even with the suggested predicate. :(
回答1:
The CTE
's are recursive.
When your CTE
's have multiple initial conditions, that means they also have different recursion stacks, and there is no way to use information from one stack in another stack.
In your example, the recursion stacks will go as follows:
(1) - first IN condition
(1, 2)
(1, 2, 3)
(1, 2, 3, 4)
(1, 2, 3) - no more children
(1, 2) - no more children
(1) - no more children, going to second IN condition
(3) - second condition
(3, 4)
(3) - no more children, returning
As you can see, these recursion stack do not intersect.
You could probably record the visited values in a temporary table, JOIN
each value with the temptable and do not follow this value it if it's found, but SQL Server
does not support these things.
So you just use SELECT DISTINCT
.
回答2:
This is the approach I used. It has been tested against several methods and was the most performant. It combines the temp table idea suggested by Quassnoi and the use of both distinct and a left join to eliminate redundant paths to the recursion. The level of the recursion is also included.
I left the failed CTE approach in the code so you could compare results.
If someone has a better idea, I'd love to know it.
create table #bar (unique_id int identity(10,10), parent_id int, child_id int)
insert #bar (parent_id, child_id)
SELECT 1,2 UNION ALL
SELECT 2,3 UNION ALL
SELECT 3,4 UNION ALL
SELECT 2,5 UNION ALL
SELECT 2,5 UNION ALL
SELECT 5,6
SET NOCOUNT ON
;with foo(unique_id, parent_id,child_id, ord, lvl) as (
-- anchor member that happens to select first and last edges:
select unique_id, parent_id, child_id, row_number() over(order by unique_id), 0
from #bar where parent_id in (1,3)
union all
-- recursive member:
select b.unique_id, b.parent_id, b.child_id, row_number() over(order by b.unique_id), foo.lvl+1
from #bar b
join foo on b.parent_id = foo.child_id
)
select unique_id, parent_id,child_id, ord, lvl from foo
/***********************************
Manual Recursion
***********************************/
Declare @lvl as int
Declare @rows as int
DECLARE @foo as Table(
unique_id int,
parent_id int,
child_id int,
ord int,
lvl int)
--Get anchor condition
INSERT @foo (unique_id, parent_id, child_id, ord, lvl)
select unique_id, parent_id, child_id, row_number() over(order by unique_id), 0
from #bar where parent_id in (1,3)
set @rows=@@ROWCOUNT
set @lvl=0
--Do recursion
WHILE @rows > 0
BEGIN
set @lvl = @lvl + 1
INSERT @foo (unique_id, parent_id, child_id, ord, lvl)
SELECT DISTINCT b.unique_id, b.parent_id, b.child_id, row_number() over(order by b.unique_id), @lvl
FROM #bar b
inner join @foo f on b.parent_id = f.child_id
--might be multiple paths to this recursion so eliminate duplicates
left join @foo dup on dup.unique_id = b.unique_id
WHERE f.lvl = @lvl-1 and dup.child_id is null
set @rows=@@ROWCOUNT
END
SELECT * from @foo
DROP TABLE #bar
回答3:
Do you happen to know which of the two edges is on a deeper level in the tree? Because in that case, you could make edge 3->4
the anchor member and start walking up the tree until you find edge 1->2
.
Something like this:
with foo(parent_id, child_id)
as
(
select parent_id, child_id
from #bar
where parent_id = 3
union all
select parent_id, child_id
from #bar b
inner join foo f on b.child_id = f.parent_id
where b.parent_id <> 1
)
select *
from foo
回答4:
Is this what you want to do?
create table #bar (parent_id int, child_id int)
insert #bar values (1,2)
insert #bar values (2,3)
insert #bar values (3,4)
declare @start_node table (parent_id int)
insert @start_node values (1)
insert @start_node values (3)
;with foo(parent_id,child_id) as (
select
parent_id
,child_id
from #bar where parent_id in (select parent_id from @start_node)
union all
select
#bar.*
from #bar
join foo on #bar.parent_id = foo.child_id
where foo.child_id not in (select parent_id from @start_node)
)
select parent_id,child_id from foo
Edit - @bacar - I don't think this is the temp table solution Quasnoi was proposing. I believe they were suggesting basically duplicate the entire recursion member contents during each recursion, and use that as a join to prevent reprocessing (and that this is not supported in ss2k5). My approach is supported, and the only change to your original is in the predicate in the recursion member to exclude recursing down paths that were explicitly in your starting set. I only added the table variable so that you would define the starting parent_ids in one location, you could just as easily have used this predicate with your original query:
where foo.child_id not in (1,3)
回答5:
EDIT -- This doesn't work at all. This is a method to stop chasing triangle routes. It doesn't do what the OP wanted.
Or you can use a recursive token separated string.
I'm at home on my laptop ( no sql server ) so this might not be completely right but here goes.....
; WITH NodeNetwork AS (
-- Anchor Definition
SELECT
b.[parent_Id] AS [Parent_ID]
, b.[child_Id] AS [Child_ID]
, CAST(b.[Parent_Id] AS VARCHAR(MAX)) AS [NodePath]
FROM
#bar AS b
-- Recursive Definition
UNION ALL SELECT
b.[Parent_Id]
, b.[child_Id]
, CAST(nn.[NodePath] + '-' + CAST(b.[Parent_Id] AS VARCHAR(MAX)) AS VARCHAR(MAX))
FROM
NodeNetwork AS nn
JOIN #bar AS b ON b.[Parent_Id] = nn.[Child_ID]
WHERE
nn.[NodePath] NOT LIKE '%[-]' + CAST(b.[Parent_Id] AS VARCHAR(MAX)) + '%'
)
SELECT * FROM NodeNetwork
Or similar. Sorry It's late and I can't test it. I'll check on Monday morning. Credit for this must go to Peter Larsson (Peso)
The idea was generated here:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=115290
回答6:
(I'm no expert on graphs, just exploring a bit)
The DISTINCT will guarantee each row is distinct, but it won't eliminate graph routes that don't end up in your last edge. Take this graph:
insert into #bar (parent_id,child_id) values (1,2)
insert into #bar (parent_id,child_id) values (1,5)
insert into #bar (parent_id,child_id) values (2,3)
insert into #bar (parent_id,child_id) values (2,6)
insert into #bar (parent_id,child_id) values (6,4)
The results of the query here include (1,5), which is not part of the route from the first edge (1,2) to the last edge (6,4).
You could try something like this, to find only routes that start with (1,2) and end with (6,4):
with foo(parent_id, child_id, route) as (
select parent_id, child_id,
cast(cast(parent_id as varchar) +
cast(child_id as varchar) as varchar(128))
from #bar
union all
select #bar.parent_id, #bar.child_id,
cast(route + cast(#bar.child_id as varchar) as varchar(128))
from #bar
join foo on #bar.parent_id = foo.child_id
)
select * from foo where route like '12%64'