Kahn proposed an algorithm in 62 to topologically sort any DAG (directed acyclic graph), pseudo code copied from Wikipedia:
L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
remove a node n from S
add n to tail of L
for each node m with an edge e from n to m do
remove edge e from the graph # This is a DESTRUCTIVE step!
if m has no other incoming edges then
insert m into S if graph has edges then
return error (graph has at least one cycle) else
return L (a topologically sorted order)
I need to implement it using IPython3, with the following implementation of a DAG:
class Node(object):
def __init__(self, name, parents):
assert isinstance(name, str)
assert all(isinstance(_, RandomVariable) for _ in parents)
self.name, self.parents = name, parents
where name
is the label for the node and parents
stores all of its parent nodes. Then the DAG class is implemented as:
class DAG(object):
def __init__(self, *nodes):
assert all(isinstance(_, Node) for _ in nodes)
self.nodes = nodes
(The DAG implementation is fixed and not to be improved.) Then I need to implement Kahn's algorithm as a function top_order
which takes in a DAG instance and returns an ordering like (node_1, node_2, ..., node_n)
. The main trouble is, this algorithm is destructive because one of its steps is remove edge e from the graph
(line 5) which will delete one member of m.parents
. However, I have to leave the DAG instance intact.
One way I can think of so far is to create a deep copy of the DAG instance taken in (even a shallow copy can't do the job because the algorithm still destroys the original instance via references), and perform the destructive algorithm on this copy, and then get the correct ordering of node names of this copy (assume there is no naming conflict between nodes), and then use this ordering of names to infer the correct ordering of the nodes of the original instance, which roughly goes like:
def top_order(network):
'''takes in a DAG, prints and returns a topological ordering.'''
assert type(network) == DAG
temp = copy.deepcopy(network) # to leave the original instance intact
ordering_name = []
roots = [node for node in temp.nodes if not node.parents]
while roots:
n_node = roots[0]
del roots[0]
ordering_name.append(n_node.name)
for m_node in temp.nodes:
if n_node in m_node.parents:
temp_list = list(m_node.parents)
temp_list.remove(n_node)
m_node.parents = tuple(temp_list)
if not m_node.parents:
roots.append(m_node)
print(ordering_name) # print ordering by name
# gets ordering of nodes of the original instance
ordering = []
for name in ordering_name:
for node in network.nodes:
if node.name == name:
ordering.append(node)
return tuple(ordering)
Two problems: first, when network
is huge the deep copy will be resource consuming; second, I want an improvement to my nested for
loops which gets the ordering of the original instance. (For the second I think something like the sorted
method etc pops into my mind.)
Any suggestion?
I'm going to suggest a less literal implementation of the algorithm: you don't need to manipulate the DAG at all, you just need to manipulate info about the DAG. The only "interesting" things the algorithm needs are a mapping from a node to its children (the opposite of what your DAG actually stores), and a count of the number of each node's parents.
These are easy to compute, and dicts can be used to associate this info with node names (assuming all names are distinct - if not, you can invent unique names with a bit more code).
Then this should work:
There's one subtle point here: the outer loop appends to
result
while it's iterating overresult
. That's intentional. The effect is that every element inresult
is processed exactly once by the outer loop, regardless of whether an element was in the initialresult
or added later.Note that while the input
DAG
andNode
s are traversed, nothing in them is altered.