As a warning, I'm still a bit inexperienced in python
I'm trying to perform the transitive reduction of directed graph using the networkx library. I've figured out an algorithm but I'm having trouble implementing it. After a quick search, I found algorithms similar to mine in other stack exchange questions but no demonstrations of how to actually code the algorithm.
Here's my algorthm:
For X in Nodes
For Y in Nodes
For z in Nodes
if (x,y) != (y,z) and (x,y) != (x,z)
if edges xy and yz are in Graph
delete xz
Here's my attempt at expressing this in python :
G = graph
N = G.Nodes()
for x in N:
for y in N:
for z in N:
if (x,y) != (y,z) and (x,y) != (x,z):
if (x,y) and (y,z) in G:
G.remove_edge(x,z)
I don't think I'm properly calling every permutation of edges in the network and was thinking of trying to use itertools. Even if I had every possible permutation, I don't know how to implement the algorithm with that information.
Any help would be wonderful. Thanks!
The following seems to work, at least for the sample data I provided. If you have a specific case that doesn't it'd be helpful to see it.
import random
import pprint
class Graph:
nodes = []
edges = []
removed_edges = []
def remove_edge(self,x,y):
e = (x,y)
try:
self.edges.remove(e)
print("Removed edge %s" % str(e))
self.removed_edges.append(e)
except:
print("Attempted to remove edge %s, but it wasn't there" % str(e))
def Nodes(self):
return self.nodes
# Sample data
def __init__(self):
self.nodes = [1,2,3,4,5]
self.edges = [
(1,2),
(1,3),
(1,4),
(1,5),
(2,4),
(3,4),
(3,5),
(4,5),
]
G = Graph()
N = G.Nodes()
for x in N:
for y in N:
for z in N:
#print("(%d,%d,%d)" % (x,y,z))
if (x,y) != (y,z) and (x,y) != (x,z):
if (x,y) in G.edges and (y,z) in G.edges:
G.remove_edge(x,z)
print("Removed edges:")
pprint.pprint(G.removed_edges)
print("Remaining edges:")
pprint.pprint(G.edges)
Output:
Removed edge (1, 4)
Attempted to remove edge (1, 4), but it wasn't there
Removed edge (1, 5)
Attempted to remove edge (2, 5), but it wasn't there
Removed edge (3, 5)
Removed edges:
[(1, 4), (1, 5), (3, 5)]
Remaining edges:
[(1, 2), (1, 3), (2, 4), (3, 4), (4, 5)]
If I read the source of the tred
utility (which calculates the transitive reduction of a graph in dot fromat as part of the graphviz utilities) correctly, then the algorithm it uses is the following: Go through all vertices of the graph and for each vertex, do a DFS on each of its children. For each vertex traversed that way remove any edge from the original parent vertex to that vertex.
I implemented that algorithm using networkx like this:
g = nx.read_graphml("input.dot")
for n1 in g.nodes_iter():
if g.has_edge(n1, n1):
g.remove_edge(n1, n1)
for n2 in g.successors(n1):
for n3 in g.successors(n2):
for n4 in nx.dfs_preorder_nodes(g, n3):
if g.has_edge(n1, n4):
g.remove_edge(n1, n4)
nx.write_graphml(g, "output.dot")
The nesting level suggests a horrible complexity but what it actually does is to already perform the first two steps of the DFS which will be carried out later. Doing it that way avoids to check whether the current node is the direct successor of the parent or not in the DFS part (because the DFS result of networkx includes the vertex the traversal was started from). So an alternative formulation would be the following which better shows the complexity of this algorithm but is also a tiny tad slower because of the additional check in the most inner looptgoo:
g = nx.read_graphml("input.dot")
for n1 in g.nodes_iter():
c = set(g.successors(n1))
for n2 in nx.dfs_preorder_nodes(g, n1):
if n2 in c:
continue
if g.has_edge(n1, n2):
g.remove_edge(n1, n2)
nx.write_graphml(g, "output.dot")
The operations has_edge
and remove_edge
use Python dictionaries and thus are O(1)
on average. The worst case time complexity of a DFS is O(E)
with E
being the number of edges in the graph. Since a DFS is carried out N
times (with N
being the number of vertices) the time complexity of above algorithm is O(NE)
.
Another funny observation is, how the above Python code seems to be orders of magnitude faster than the tred
utility from the graphviz suite. I have no idea why this is. Here is a comparison of run times on a acyclic graph with 482 vertices and 9656 edges:
- tred: 58.218 seconds
- python: 4.720 seconds
I also tried another benchmark on a graph with 14956 vertices and 190993 edges but while the Python code from above finished in 11 minutes, tred
is still running after 2 hours of run time at the time of this writing.
EDIT: 7 days and 3 hours later, tred
is still churning along on that graph. I'm giving up now.
if (x,y) and (y,z) in G:
This needs to be written as:
if (x,y) in G and (y,z) in G: