Question / example / expected values
I need to determine a Strahler number or Strahler stream order for a directed graph representing a stream network. I can derive information forwards and backwards using WITH RECURSIVE
queries, but it seems I need to do something different to determine the Strahler number.
For example, here is a 19 segment stream network with 10 tributaries and one outlet. The upstream portion of each segment is represented by a node ID.
And the same data in a table structure, where the segments are connected by to_node
, which is null for the basin outlet.
CREATE TABLE streams (
node integer PRIMARY KEY,
to_node integer REFERENCES streams(node),
expected_order integer
);
INSERT INTO streams(node, to_node, expected_order) VALUES
(1, NULL, 4),
(2, 1, 4),
(3, 2, 3),
(4, 2, 3),
(5, 4, 3),
(6, 3, 2),
(7, 3, 2),
(8, 5, 2),
(9, 5, 2),
(10, 6, 1),
(11, 6, 1),
(12, 7, 1),
(13, 7, 1),
(14, 8, 1),
(15, 8, 1),
(16, 9, 1),
(17, 9, 1),
(18, 4, 1),
(19, 1, 1);
The expected result (expected_order
) for the Strahler numbers is visualized here:
There are three rules (from the GRASS 7.0 Manual):
- if the node has no children, it's Strahler order is 1.
- if the node has one and only one tributary with Strahler greatest order i, and all other tributaries have order less than i, then the order remains i.
- if the node has two or more tributaries with greatest order i, then the Strahler order of the node is i + 1.
What I've found / tried
From what I have found in digging to solve this problem is that this calculation can be done with SQL (except I think their "SQL script" is written for MS SQL Server). However, I haven't found something that can be done with PostgreSQL 9.1.
One of the best attempts I have is to count the number of upstream nodes from each node, which correctly identifies the tributaries (1st order), but not others:
WITH RECURSIVE search_graph AS (
SELECT node AS start_node, node
FROM streams
-- Connect downstream towards outlet(s)
UNION ALL
SELECT sg.start_node, n.node
FROM streams n
JOIN search_graph sg ON n.to_node = sg.node
)
SELECT start_node, count(sg.node) as upstream_nodes, expected_order
FROM search_graph sg
JOIN streams s ON sg.start_node = s.node
GROUP BY start_node, expected_order
ORDER BY upstream_nodes DESC, start_node;
start_node | upstream_nodes | expected_order
------------+----------------+----------------
1 | 19 | 4
2 | 17 | 4
4 | 9 | 3
3 | 7 | 3
5 | 7 | 3
6 | 3 | 2
7 | 3 | 2
8 | 3 | 2
9 | 3 | 2
10 | 1 | 1
11 | 1 | 1
12 | 1 | 1
13 | 1 | 1
14 | 1 | 1
15 | 1 | 1
16 | 1 | 1
17 | 1 | 1
18 | 1 | 1
19 | 1 | 1
(19 rows)
An idea is to use a nth_value(value any, nth integer)
window function with an appropriately set window frame range. However, I'm not sure how to set this up, or if it can be set up to identify Strahler numbers. Another [less exciting] idea is to manually run iterations for each Strahler number, which I expect to be between five and eight orders (iterations) for my real world data. This can be done with a DO
statement. But any better ideas would be most welcome.
I hit a limitation with CTE. Recursive CTE can't do LEFT JOIN to itself. Just ended up doing it in function.
Live test: https://www.db-fiddle.com/f/8z58LCVhD62YvkeJjriW8d/0
Test:
Output:
This was the original plan
Live test: https://www.db-fiddle.com/f/8z58LCVhD62YvkeJjriW8d/1
Output:
The original plan is to evaluate each node in safe order (will be facilitated by eo field), start at nodes with fewer siblings, up to nodes with many siblings. Then on each node that will be evaluated, will also check its immediate children (recursive CTE will do a LEFT JOIN to itself), then perform the necessary Strahler's three conditions. However, CTE has a limitation, recursive CTE can't do LEFT JOIN to itself.