How to determine Strahler number on a directed gra

2019-07-13 11:48发布

Question / example / expected values

I need to determine a Strahler number or Strahler stream order for a directed graph representing a stream network. I can derive information forwards and backwards using WITH RECURSIVE queries, but it seems I need to do something different to determine the Strahler number.

For example, here is a 19 segment stream network with 10 tributaries and one outlet. The upstream portion of each segment is represented by a node ID.

stream_nodes

And the same data in a table structure, where the segments are connected by to_node, which is null for the basin outlet.

CREATE TABLE streams (
  node integer PRIMARY KEY,
  to_node integer REFERENCES streams(node),
  expected_order integer
);
INSERT INTO streams(node, to_node, expected_order) VALUES
(1, NULL, 4),
(2, 1, 4),
(3, 2, 3),
(4, 2, 3),
(5, 4, 3),
(6, 3, 2),
(7, 3, 2),
(8, 5, 2),
(9, 5, 2),
(10, 6, 1),
(11, 6, 1),
(12, 7, 1),
(13, 7, 1),
(14, 8, 1),
(15, 8, 1),
(16, 9, 1),
(17, 9, 1),
(18, 4, 1),
(19, 1, 1);

The expected result (expected_order) for the Strahler numbers is visualized here:

expected_stream_order

There are three rules (from the GRASS 7.0 Manual):

  1. if the node has no children, it's Strahler order is 1.
  2. if the node has one and only one tributary with Strahler greatest order i, and all other tributaries have order less than i, then the order remains i.
  3. if the node has two or more tributaries with greatest order i, then the Strahler order of the node is i + 1.

What I've found / tried

From what I have found in digging to solve this problem is that this calculation can be done with SQL (except I think their "SQL script" is written for MS SQL Server). However, I haven't found something that can be done with PostgreSQL 9.1.

One of the best attempts I have is to count the number of upstream nodes from each node, which correctly identifies the tributaries (1st order), but not others:

WITH RECURSIVE search_graph AS (
  SELECT node AS start_node, node
  FROM streams
  -- Connect downstream towards outlet(s)
  UNION ALL
  SELECT sg.start_node, n.node
  FROM streams n
  JOIN search_graph sg ON n.to_node = sg.node
)
SELECT start_node, count(sg.node) as upstream_nodes, expected_order
FROM search_graph sg
JOIN streams s ON sg.start_node = s.node
GROUP BY start_node, expected_order
ORDER BY upstream_nodes DESC, start_node;

 start_node | upstream_nodes | expected_order
------------+----------------+----------------
          1 |             19 |              4
          2 |             17 |              4
          4 |              9 |              3
          3 |              7 |              3
          5 |              7 |              3
          6 |              3 |              2
          7 |              3 |              2
          8 |              3 |              2
          9 |              3 |              2
         10 |              1 |              1
         11 |              1 |              1
         12 |              1 |              1
         13 |              1 |              1
         14 |              1 |              1
         15 |              1 |              1
         16 |              1 |              1
         17 |              1 |              1
         18 |              1 |              1
         19 |              1 |              1
(19 rows)

An idea is to use a nth_value(value any, nth integer) window function with an appropriately set window frame range. However, I'm not sure how to set this up, or if it can be set up to identify Strahler numbers. Another [less exciting] idea is to manually run iterations for each Strahler number, which I expect to be between five and eight orders (iterations) for my real world data. This can be done with a DO statement. But any better ideas would be most welcome.

1条回答
我只想做你的唯一
2楼-- · 2019-07-13 11:53

I hit a limitation with CTE. Recursive CTE can't do LEFT JOIN to itself. Just ended up doing it in function.

Live test: https://www.db-fiddle.com/f/8z58LCVhD62YvkeJjriW8d/0

create or replace function strahler(_parent int) returns table(node int, sn int)
as
$$
    select 
        s.node,

        case 
            -- If the node is a leaf (has no children), its Strahler number is one.
            when count(st.*) = 0 then 
                1

            when count(st.*) >= 2 then
                case 
                    -- If the node has one child with Strahler number i, 
                    -- and all other children have Strahler numbers less than i, 
                    -- then the Strahler number of the node is i again.
                    when min(st.sn) < max(st.sn) then
                        max(st.sn)

                    -- If the node has two or more children with Strahler number i, 
                    -- and no children with greater number, 
                    -- then the Strahler number of the node is i + 1.
                    when min(st.sn) = max(st.sn) then
                        max(st.sn) + 1                                          
                end

        end as sn   

    from streams s
    left join lateral strahler(s.node) st  on true
    where _parent = 0 or s.to_node = _parent
    group by s.node
$$ language 'sql';

Test:

select st.*, s.expected_order 
from strahler(0) st 
join streams s on st.node = s.node 
order by st.node;

Output:

| node | sn  | expected_order |
| ---- | --- | -------------- |
| 1    | 4   | 4              |
| 2    | 4   | 4              |
| 3    | 3   | 3              |
| 4    | 3   | 3              |
| 5    | 3   | 3              |
| 6    | 2   | 2              |
| 7    | 2   | 2              |
| 8    | 2   | 2              |
| 9    | 2   | 2              |
| 10   | 1   | 1              |
| 11   | 1   | 1              |
| 12   | 1   | 1              |
| 13   | 1   | 1              |
| 14   | 1   | 1              |
| 15   | 1   | 1              |
| 16   | 1   | 1              |
| 17   | 1   | 1              |
| 18   | 1   | 1              |
| 19   | 1   | 1              |

This was the original plan

Live test: https://www.db-fiddle.com/f/8z58LCVhD62YvkeJjriW8d/1

with recursive search_graph as (
    select node as start_node, node
    from streams

    union all
    select sg.start_node, n.node
    from streams n
    join search_graph sg on n.to_node = sg.node
)
, get_kids as 
(
    select 
        s.node as kid, 
        count(sg.*) - 1 as kid_kids, 
        s.expected_order
    from streams s 
    join search_graph sg on s.node = sg.start_node 
    group by s.node, s.expected_order
    order by kid_kids
)
, order_safe as 
(
    select 
        row_number() over(s) eo, 

        gk.kid, 
        gk.kid_kids, 

        gk_kid.to_node as parent, 
        gk_p.kid_kids as siblings 
    from get_kids gk
    left join streams gk_kid on gk.kid = gk_kid.node
    left join get_kids gk_p on gk_kid.to_node = gk_p.kid
    window s as (order by gk_p.kid_kids /* siblings */, gk_kid.to_node  /* parent */) 
)    
select * from order_safe;

Output:

| eo  | kid | kid_kids | parent | siblings |
| --- | --- | -------- | ------ | -------- |
| 1   | 11  | 0        | 6      | 2        |
| 2   | 10  | 0        | 6      | 2        |
| 3   | 12  | 0        | 7      | 2        |
| 4   | 13  | 0        | 7      | 2        |
| 5   | 15  | 0        | 8      | 2        |
| 6   | 14  | 0        | 8      | 2        |
| 7   | 17  | 0        | 9      | 2        |
| 8   | 16  | 0        | 9      | 2        |
| 9   | 6   | 2        | 3      | 6        |
| 10  | 7   | 2        | 3      | 6        |
| 11  | 9   | 2        | 5      | 6        |
| 12  | 8   | 2        | 5      | 6        |
| 13  | 5   | 6        | 4      | 8        |
| 14  | 18  | 0        | 4      | 8        |
| 15  | 3   | 6        | 2      | 16       |
| 16  | 4   | 8        | 2      | 16       |
| 17  | 19  | 0        | 1      | 18       |
| 18  | 2   | 16       | 1      | 18       |
| 19  | 1   | 18       |        |          |

The original plan is to evaluate each node in safe order (will be facilitated by eo field), start at nodes with fewer siblings, up to nodes with many siblings. Then on each node that will be evaluated, will also check its immediate children (recursive CTE will do a LEFT JOIN to itself), then perform the necessary Strahler's three conditions. However, CTE has a limitation, recursive CTE can't do LEFT JOIN to itself.

查看更多
登录 后发表回答