Say that I have a regular network of NxN
nodes. The shortest path between two nodes is the minimum number of hops required to reach one target node from a source node. Now, each shortest path passes through a number of nodes along the way.
My goal: for each node in the network, I want to count the number of shortest paths that pass through a specific node, and save that number in a dict
.
In this little example, node B has 4 shortest paths passing through it:
A -> B
, A -> C
, C -> B
, C -> A
. I want to be able to compute this number for each node in a generic graph.
I know I could use the nx.betweenness_centrality(), but this will give me a numerator (which is, for each node, what I want) divided by a denominator which stores all the possible shortest paths between two nodes. I have even accessed the source code but I wasn't able to figure out where the division is performed.
I know this is a wordy question but I had no other means to explain my problem. Thank you to anyone who will help.
EDIT
This is the source code for nx.betweenness_centrality()
. My graph is undirected. It is unclear which line hosts the division I introduced above:
def betweenness_centrality(G, k=None, normalized=True, weight=None,
endpoints=False,
seed=None): #G is the graph
betweenness = dict.fromkeys(G, 0.0)
if k is None:
nodes = G
else:
random.seed(seed)
nodes = random.sample(G.nodes(), k)
for s in nodes:
# single source shortest paths
if weight is None: # use BFS
S, P, sigma = _single_source_shortest_path_basic(G, s)
else: # use Dijkstra's algorithm
S, P, sigma = _single_source_dijkstra_path_basic(G, s, weight)
# accumulation
if endpoints:
betweenness = _accumulate_endpoints(betweenness, S, P, sigma, s)
else:
betweenness = _accumulate_basic(betweenness, S, P, sigma, s)
# rescaling
betweenness = _rescale(betweenness, len(G),
normalized=normalized,
directed=G.is_directed(),
k=k)
return betweenness #Returns a dict with the node ID as a key and the value
You could just use nx.all_pairs_shortest_path(G)
:
>>> G = nx.Graph()
>>> G.add_path([0,1,2])
>>> spaths = nx.all_pairs_shortest_path(G)
>>> spaths
{0: {0: [0], 1: [0, 1], 2: [0, 1, 2]},
1: {0: [1, 0], 1: [1], 2: [1, 2]},
2: {0: [2, 1, 0], 1: [2, 1], 2: [2]}}
Which finds you all the shortest paths between all pair of nodes in a graph (which is very expensive for large graphs). Then, the following code gives you the desired result:
def num_spaths(G):
n_spaths = dict.fromkeys(G, 0.0)
spaths = nx.all_pairs_shortest_path(G)
for source in G:
for path in spaths[source].values():
for node in path[1:]: # ignore firs element (source == node)
n_spaths[node] += 1 # this path passes through `node`
return n_spaths
In your example:
>>> num_spaths(G)
{0: 2.0, 1: 4.0, 2: 2.0}
Additionally, if you could go inside the all_pairs_shortest_path
code and sightly edit it to add a counter for shortest paths and
for node in path[1:]:
n_spaths[node] += 1
this way, you would update the number of paths online at the same time you find one, rather than having to iterate over all them (as my code does) after all them are calculated.
EDIT: in networkx (github), line 251 says:
paths[w]=paths[v]+[w]
You could potentially modify that function to update the number of shortest paths online (at the same time they are found) by passing n_spath
as an argument to the modified function and updating it inside. Then calling the function would be as:
def num_spaths(G):
n_spaths = dict.fromkeys(G, 0.0)
for n in G:
my_single_source_shortest_path(G, n, n_spaths)
return n_spaths
and n_spaths
would have the number of shortest path updated.
EDIT: The above only works if there is only 1 shortest path between A and B, as nx.all_pairs_shortest_path
only returns one shortest paths per each node pair. Find bellow the brute force search for all posible shortest paths. I wouldn't recommend runing this in a large graph:
def bf_num_spaths(G):
n_spaths = dict.fromkeys(G, 0.0)
for source in G:
for target in G:
if source == target:
continue
for path in nx.all_shortest_paths(G, source, target):
for node in path[1:]: # ignore firs element (source == node)
n_spaths[node] += 1 # this path passes through `node`
return n_spaths
I'll try to give pure algorithmic answer.
What if there are several shortest paths from v_1
to v_2
and some of them contain v_mid
? If this case is counted as 1
for each (v_1, v_2)
pair, you can use Floyd-Warshall algorithm. After its run, you have matrix with length of shortest paths. For each u
, iterate over all (v_1, v_2)
pairs and count +1
if D[v_1, u] + D[u, v_2] == D[v_1, v_2]
. It's O(n^3) and good for dense graphs.
I'm not sure but I think that it can be adjusted to count number of shortest paths.
Without further access to the source code it is impossible to say for sure but my bet would be on:
S, P, sigma = _single_source_shortest_path_basic(G, s)
Or
betweenness = _rescale(betweenness, len(G),...
Rescale just sounds like it will contain a division.
Have you tried calling this function with normalized=False
?
I believe you can modify the Floyd-Warshall Algorithm to save the length of the path in another matrix. Pseudocode (based on wikipedia):
let dist and num be two |V| × |V| arrays of minimum
distances and lengths initialized to ∞ (infinity)
for each vertex v
dist[v][v] ← 0
num[v][v] ← 0
for each edge (u,v)
dist[u][v] ← w(u,v) // the weight of the edge (u,v)
num[u][v] ← 1
for k from 1 to |V|
for i from 1 to |V|
for j from 1 to |V|
if dist[i][j] > dist[i][k] + dist[k][j]
dist[i][j] ← dist[i][k] + dist[k][j]
num[i][j] ← num[i][k] + num[k][j]
end if