I currently have a list of connections stored in a list where each connection is a directed link that connects two points and no point ever links to more than one point or is linked to by more than one point. For example:
connections = [ (3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1) ]
Should produce:
ordered = [ [ 4, 6, 5, 3, 7, 8 ], [ 1, 2, 1 ] ]
I have attempt to do this using an algorithm that takes an input point and a list of connections and recursively calls itself to find the next point and add it to the growing ordered list. However, my algorithm breaks down when I don't start with the correct point (though this should just be a matter of repeating the same algorithm in reverse), but also when there are multiple unconnected strands
What would be the best way of writing an efficient algorithm to order these connections?
Algorithm for a Solution
You're looking for a topological sort algorithm:
from collections import defaultdict
def topological_sort(dependency_pairs):
'Sort values subject to dependency constraints'
num_heads = defaultdict(int) # num arrows pointing in
tails = defaultdict(list) # list of arrows going out
for h, t in dependency_pairs:
num_heads[t] += 1
tails[h].append(t)
ordered = [h for h in tails if h not in num_heads]
for h in ordered:
for t in tails[h]:
num_heads[t] -= 1
if not num_heads[t]:
ordered.append(t)
cyclic = [n for n, heads in num_heads.iteritems() if heads]
return ordered, cyclic
if __name__ == '__main__':
connections = [(3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1)]
print topological_sort(connections)
Here is the output for your sample data:
([4, 6, 5, 3, 7, 8], [1, 2])
The runtime is linearly proportional to the number of edges (dependency pairs).
HOW IT WORKS
The algorithm is organized around a lookup table called num_heads that keeps a count the number of predecessors (incoming arrows). Consider an example with the following connections: a->h b->g c->f c->h d->i e->d f->b f->g h->d h->e i->b
, the counts are:
node number of incoming edges
---- ------------------------
a 0
b 2
c 0
d 2
e 1
f 1
g 2
h 2
i 1
The algorithm works by "visting" nodes with no predecessors. For example, nodes a
and c
have no incoming edges, so they are visited first.
Visiting means that the nodes are output and removed from the graph. When a node is visited, we loop over its successors and decrement their incoming count by one.
For example, in visiting node a
, we go to its successor h
to decrement its incoming count by one (so that h 2
becomes h 1
.
Likewise, when visiting node c
, we loop over its successors f
and h
, decrementing their counts by one (so that f 1
becomes f 0
and h 1
becomes h 0
).
The nodes f
and h
no longer have incoming edges, so we repeat the process of outputting them and removing them from the graph until all the nodes have been visited. In the example, the visitation order (the topological sort is):
a c f h e d i b g
If num_heads ever arrives at a state when there are no nodes without incoming edges, then it means there is a cycle that cannot be topologically sorted and the algorithm exits to show the requested results.
Something like this:
from collections import defaultdict
lis = [ (3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1) ]
dic = defaultdict(list)
for k,v in lis:
if v not in dic:
dic[k].append(v)
else:
dic[k].extend([v]+dic[v])
del dic[v]
for k,v in dic.items():
for x in v:
if x in dic and x!=k:
dic[k].extend(dic[x])
del dic[x]
print dic
print [[k]+v for k,v in dic.items()]
output:
defaultdict(<type 'list'>, {2: [1, 2], 4: [6, 5, 3, 7, 8]})
[[2, 1, 2], [4, 6, 5, 3, 7, 8]]
I think you can probably do it in O(n)
with something like this:
ordered = {}
for each connection (s,t):
if t exists in ordered :
ordered[s] = [t] + ordered[t]
del ordered[t]
else:
ordered[s] = [t]
# Now merge...
for each ordered (s : [t1, t2, ... tN]):
ordered[s] = [t1, t2, ... tN] + ordered[tN]
del ordered[tN]
At the end you will be left with something like this, but you can convert that rather easily
ordered = {
4 : [6, 5, 3, 7, 8],
2 : [1, 2]
}