Recursive CTE stop condition for loops

2019-05-13 10:29发布

问题:

I need to iterate a graph with loops using the recursive CTE.

The problem is the loop part.

I want if there are loops, then the shortest path to be selected. This basically means ignoring the loops because the recursion is "width first".

The example below shows the returned data:

The problem is the commented out INSERT statement that creates a loop. With it uncommented, obviously, the query will never finish.

What I need is to return the same data as it would be without the loop.

DROP TABLE IF EXISTS edges;

CREATE TABLE edges(
  src integer,
  dst integer,
  data integer
);

INSERT INTO edges VALUES (1, 2, 1);
INSERT INTO edges VALUES (2, 3, 1);
--INSERT INTO edges VALUES (3, 2, 1);  -- This entry creates a loop
INSERT INTO edges VALUES (1, 4, 1);
INSERT INTO edges VALUES (4, 5, 1);
INSERT INTO edges VALUES (5, 2, 1);

INSERT INTO edges VALUES (1, 4, 2);
INSERT INTO edges VALUES (4, 5, 2);
INSERT INTO edges VALUES (4, 6, 2);


WITH RECURSIVE paths AS (
  -- For simplicity assume node 1 is the start
  -- we'll have two starting nodes for data = 1 and 2
  SELECT DISTINCT
    src           as node
    , data        as data
    , 0           as depth
    , src::text   as path
  FROM edges
  WHERE
    src = 1

  UNION ALL

  SELECT DISTINCT
    edges.dst
    , edges.data
    , depth + 1
    , paths.path || '->' || edges.dst::text
  FROM paths
    JOIN edges ON edges.src = paths.node AND edges.data = paths.data
    -- AND eliminate loops?
)

SELECT * FROM paths;

Returning:

 node | data | depth |     path      
------+------+-------+---------------
    1 |    1 |     0 | 1
    1 |    2 |     0 | 1
    2 |    1 |     1 | 1->2
    4 |    1 |     1 | 1->4
    4 |    2 |     1 | 1->4
    3 |    1 |     2 | 1->2->3
    5 |    2 |     2 | 1->4->5
    6 |    2 |     2 | 1->4->6
    5 |    1 |     2 | 1->4->5
    2 |    1 |     3 | 1->4->5->2
    3 |    1 |     4 | 1->4->5->2->3
(11 rows)

回答1:

WITH RECURSIVE paths AS (
    -- For simplicity assume node 1 is the start
    -- we'll have two starting nodes for data = 1 and 2
    SELECT DISTINCT
        src           as node
        , data        as data
        , 0           as depth
        , src::text   as path
        , ''          as edgeAdded   
    FROM edges
    WHERE
        src = 1

    UNION ALL

    SELECT DISTINCT
        edges.dst
        , edges.data
        , depth + 1
        , paths.path || '->' || edges.dst::text
        , edges.src::text || '->' || edges.dst::text
    FROM paths
    JOIN edges ON edges.src = paths.node AND edges.data = paths.data
    AND NOT paths.path LIKE '%' || edges.dst::text || '%' 
        -- AND eliminate loops?
)
SELECT * FROM paths;

Here with the condition AND NOT paths.path LIKE '%' || edges.dst::text || '%' we are avoiding back edges which would lead to a loop.
http://www.sqlfiddle.com/#!12/086ee/1