As a warning, I'm still a bit inexperienced in python
I'm trying to perform the transitive reduction of directed graph using the networkx library. I've figured out an algorithm but I'm having trouble implementing it. After a quick search, I found algorithms similar to mine in other stack exchange questions but no demonstrations of how to actually code the algorithm.
Here's my algorthm:
For X in Nodes
For Y in Nodes
For z in Nodes
if (x,y) != (y,z) and (x,y) != (x,z)
if edges xy and yz are in Graph
delete xz
Here's my attempt at expressing this in python :
G = graph
N = G.Nodes()
for x in N:
for y in N:
for z in N:
if (x,y) != (y,z) and (x,y) != (x,z):
if (x,y) and (y,z) in G:
G.remove_edge(x,z)
I don't think I'm properly calling every permutation of edges in the network and was thinking of trying to use itertools. Even if I had every possible permutation, I don't know how to implement the algorithm with that information.
Any help would be wonderful. Thanks!
If I read the source of the
tred
utility (which calculates the transitive reduction of a graph in dot fromat as part of the graphviz utilities) correctly, then the algorithm it uses is the following: Go through all vertices of the graph and for each vertex, do a DFS on each of its children. For each vertex traversed that way remove any edge from the original parent vertex to that vertex.I implemented that algorithm using networkx like this:
The nesting level suggests a horrible complexity but what it actually does is to already perform the first two steps of the DFS which will be carried out later. Doing it that way avoids to check whether the current node is the direct successor of the parent or not in the DFS part (because the DFS result of networkx includes the vertex the traversal was started from). So an alternative formulation would be the following which better shows the complexity of this algorithm but is also a tiny tad slower because of the additional check in the most inner looptgoo:
The operations
has_edge
andremove_edge
use Python dictionaries and thus areO(1)
on average. The worst case time complexity of a DFS isO(E)
withE
being the number of edges in the graph. Since a DFS is carried outN
times (withN
being the number of vertices) the time complexity of above algorithm isO(NE)
.Another funny observation is, how the above Python code seems to be orders of magnitude faster than the
tred
utility from the graphviz suite. I have no idea why this is. Here is a comparison of run times on a acyclic graph with 482 vertices and 9656 edges:I also tried another benchmark on a graph with 14956 vertices and 190993 edges but while the Python code from above finished in 11 minutes,
tred
is still running after 2 hours of run time at the time of this writing.EDIT: 7 days and 3 hours later,
tred
is still churning along on that graph. I'm giving up now.This needs to be written as:
The following seems to work, at least for the sample data I provided. If you have a specific case that doesn't it'd be helpful to see it.
Output: