I have a DAG representing a list of properties. These properties are such that if a>b, then a has a directed edge to b. It is transitive as well, so that if a>b and b>c, then a has a directed edge to c.
However, the directed edge from a to c is superfluous because a has a directed edge to b and b has a directed edge to c. How can I prune all these superfluous edges? I was thinking of using a minimum spanning tree algorithm, but I'm not really sure what is the appropriate algorithm to apply in this situation
I suppose I could do a depth first search from each node and all its outgoing edges and compare if it can reach certain nodes without using certain edges, but this seems horribly inefficient and slow.
After the algorithm is complete, the output would be a linear list of all the nodes in an order that is consistent with the graph. So if a has three directed edges to b,c, and d. b and c also each of which has a directed edge to d, the output could be either abcd or acbd.
This is called the transitive reduction problem. Formally speaking, you are looking for a minimal (fewest edges) directed graph, the transitive closure of which is equal to the transitive closure of the input graph. (The diagram on the above Wikipedia link makes it clear.)
Apparently there exists an efficient algorithm for solving this problem that takes the same time as for producing a transitive closure (i.e. the more common inverse problem of adding transitive links instead of removing them), however the link to the 1972 paper by Aho, Garey, and Ullman costs $25 to download, and some quick googling didn't turn up any nice descriptions.
EDIT: Scott Cotton's graphlib
contains a Java implementation! This Java library looks to be very well organised.
Actually, after looking around a little more, I think a Topologicalsort is what I'm really after here.
If these are already n nodes with directed edges:
- Starting from any point M, loop all its child edge, select the biggest child (like N), remove other edges, the complexity should be o(n) . If no N exists (no child edge, goto step 3).
- start from N, repeat step 1.
- start from point M, select the smallest parent node ( like T), remove others' edges.
- start from T, repeat step 3.....
Actually it's just a ordering algorithm, and the totally complexity should be o(0.5n^2).
One problem is that if we want loop one node's parent nodes, then we need more memory to log edge so we can trace back from child to parent. This can be improved in the step 3 where we choose one node from the left nodes bigger than M, this means we need to keep a list of nodes to know what nodes are left..