Good algorithm for finding the diameter of a (spar

2019-03-08 01:39发布

站内文章 / 前端开发

20 0

走好不送

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a large, connected, sparse graph in adjacency-list form. I would like to find two vertices that are as far apart as possible, that is, the diameter of the graph and two vertices achieving it.

I am interested in this problem in both the undirected and directed cases, for different applications. In the directed case, I of course care about directed distance (the shortest directed path from one vertex to another).

Is there a better approach than computing all-pairs shortest paths?

Edit: By "as far apart as possible", I of course mean the "longest shortest path" -- that is, the maximum over all pairs of vertices of the shortest distance from one to the other.

回答1:

Well, I've put a little bit of thought on the problem, and a bit of googling, and I'm sorry, but I can't find any algorithm that doesn't seem to be "just find all pairs shortest path".

However, if you assume that Floyd-Warshall is the only algorithm for computing such a thing (Big-Theta of |V|^3), then I have a bit of good news for you: Johnson's Algorithm for Sparse Graphs (thank you, trusty CLRS!) computes all pairs shortest paths in (Big-Oh (|V|^2 * lgV + VE)), which should be asymptotically faster for sparse graphs.

Wikipedia says it works for directed (not sure about undirected, but at least I can't think of a reason why not), here's the link.

Is there anything else about the graph that may be useful? If it can be mapped easily onto a 2D plane (so, its planar and the edge weights obey the triangle inequality [it may need to satisfy a stricter requirement, I'm not sure]) you may be able to break out some geometric algorithms (convex-hull can run in nlogn, and finding the farthest pair of points is easy from there).

Hope this helps! - Agor

Edit: I hope the link works now. If not, just google it. :)

回答2:

I don't know of a better method for computing diameter other than all shortest paths, but Mathematica uses the following approximation for PseudoDiameter:

A graph geodesic is the shortest path between two vertices of a graph. The graph diameter is the longest possible length of all graph geodesics of the graph. PseudoDiameter finds an approximate graph diameter. It works by starting from a vertex u, and finds a vertex v that is farthest away from u. This process is repeated by treating v as the new starting vertex, and ends when the graph distance no longer increases. A vertex from the last level set that has the smallest degree is chosen as the final starting vertex u, and a traversal is done to see if the graph distance can be increased. This graph distance is taken to be the pseudo-diameter.

http://reference.wolfram.com/mathematica/GraphUtilities/ref/PseudoDiameter.html

回答3:

Edit I'm undeleting again, simply so I can continue commenting. I have some comments on Johnson's Algorithm below this answer. - Aaron

My original comment : I too am curious about this problem, but don't have an answer. It seems related to the Minimum Spanning Tree, the subgraph connecting all vertices but having fewest (or lowest weight) edges. That is an old problem with a number of algorithms; some of which seem quite easy to implement.

I had initially hoped that the diameter would be obvious once the MST had been found, but I'm losing hope now :-( Perhaps the MST can be used to place a reasonable upper bound on the diameter, which you can use to speed up your search for the actual diameter?

回答4:

Here's some thoughts on doing better than all pairs shortest paths in an undirected graph, although I'm not sure just how much of an improvement it would be.

Here's a subroutine that will find two nodes distance D apart, if there are any. Pick an arbitrary node x and compute M[x] = maximum distance from x to any other node (using any single source shortest path algorithm). If M[x] >= D, then x is one of our nodes and the other is easy to find. However, if M[x] < D, then neither endpoint we're looking for can be less than distance D - M[x] from x (because there are paths from that node to all other nodes, through x, of distance < D). So find all nodes of distance less than D-M[x] from x and mark them as bad. Pick a new x, this time making sure we avoid all nodes marked as bad, and repeat. Hopefully, we'll mark lots of nodes as bad so we'll have to do many fewer than |V| shortest path computations.

Now we just need to set D=diam(G) and run the above procedure. We don't know what diam(G) is, but we can get a pretty tight range for it, as for any x, M[x] <= diam(G) <= 2M[x]. Pick a few x to start, compute M[x] for each, and compute upper and lower bounds on diam(G) as a result. We can then do binary search in the resulting range, using the above procedure to find a path of the guessed length, if any.

Of course, this is undirected only. I think you could do a similar scheme with directed graphs. The bad nodes are those which can reach x in less than D-M[x], and the upper bound on diam(G) doesn't work so you'd need a larger binary search range.

回答5:

Not sure if it fits the bill, but interesting:

HADI: Fast Diameter Estimation and Mining in Massive Graphs with Hadoop

U. Kang, C. Tsourakakis, A. P. Appel, C. Faloutsos, J. Leskovec, “HADI: Fast Diameter Estimation and Mining in Massive Graphs with Hadoop”, CMU ML Tech Report CMU-ML-08-117, 2008.

回答6:

if the graph is a tree (and undirected). you can simply run 2 dfs's. Start at a random node u and dfs to find farthest node v. Then start at v and find farthest length. That length is optimal

回答7:

Forgive me if my answer is not correct in terms of syntax but my Algorithm course was a while ago (and not in English).

If I understand your problem correctly, you want to know what is the highest number you can count up to by starting from node A and reaching node B without "retracing" your steps. If this is the case, I would imagine your graph as acyclic (the cyclic option comes later).

First of all, the upper limit is the number of edges. How I see the thing is: take one node, create a tree where the node is at the root and each subsequent node you can reach is at the next level. The height of the tree you build is the diameter, and the leaves are the nodes that are at that distance. If that distance = number of edges you're done. If not, pick another node and repeat.

I think it's similar to the building of a breadth-first search. Not knowing much else about the graph you could employ some heuristics to see which tree orientation (i.e. which node should be picked first) would be better, but that's another topic.

Regarding the cyclicity of the graph -- as others have pointed out those can lead to infinite loops. A way to get rid of those could be to 'rule out' the nodes that belong to a cycle and adding the longest path between them as the value you'll get by entering the cycle and coming out of it, touch each node only once.

Now, as I said this method could very easily be the same as doing all-paires shortest path. Worst case complexity is certainly the same, and could not be otherwise.

回答8:

I really doubt that there is any method of finding the longest-shortest path without having to use some sort of all pairs shortest path algorithm (finding single source shortest path repeatedly is basically doing all pairs in the worst case).

'Diameter' becomes hard to define in terms of the 'longest path' if the graph is not a tree or a DAG. The 'longest' path can be infinite if there are cycles in the graph. Hence a simple traversal of the graph cannot yield the longest path over all nodes. Since you have already stated that your graph is not necessarily acyclic, and you are interested in the "longest shortest" path, there does not seem to be any method that can avoid finding the shortest path for all nodes. Using Johnson's Algorithm, as Agor suggested, is probably best for doing that.

You can of course go with a heuristics based approach. The algorithm that uses pseudo-peripheral vertex appears to be the most commonly used approach.

回答9:

One way to obtain an estimate of this number is to start at a random point, and do a breadth-first "grassfire" algorithm, marking the shortest distance to each node. The longest distance here is your estimate.

Running this extremely fast algorithm multiple times with different starting points and then taking the maximum will increase the accuracy of the estimate, and, of course, give you a decent lower bound. Depending on the distribution and connectivity of your graph, this estimate may even be accurate!

If your graph is large enough, asymptotic analysis of how the estimate changes as you add more samples might allow you to project to an even better guess.

If you're interested in an exact answer, it seems unlikely that you can get away with cutting corners too much unless your graph is easy to partition into components that are weakly connected with each other - in which case you can restrict your search to shortest path between all pairs of vertices in different components.

回答10:

Pick a vertex v and do BFS(v), this will calculate the distance from v for all vertices. Get the longest distance. This is O(V+E)

Now run this algorithm for all v vertices and pick the maximum of these longest distances. Overall complexity: O(V*(V+E))

回答11:

A dirty method:

We know that for a graph G(V,E) with |V|=n and |E|=m, Dijkstra algorithm runs in O(m+nlogn) and this is for a single source. For your all-pairs problem, you need to run Dijkstra for each node as a starting point.

However, if you have many machines, you can easily parallel this process.

This method is easiest to implement, definitely not very good.

回答12:

Yes there is a better method for finding the Diameter of the graph. Here I made a simple class to demonstrate it. The Vertices would be the Points on your graph.

public class MyTestClass
{
    //Simple Point struct
    struct Vertex
    {
        public float X, Y;
        public Vertex(float pX, float pY)
        {
            X = pX;
            Y = pY;
        }
    }

    //For getting the bounds of your graph
    struct BoundingBox
    {
        public float Left, Right, Bottom, Top;
        public BoundingBox(float pLeft, float pRight, float pBottom, float pTop)
        {
            Left = pLeft;
            Right = pRight;
            Bottom = pBottom;
            Top = pTop;
        }
    }

    //Properties
    Vertex[] vertices;
    BoundingBox bound;
    float diameter;

    //Constructor
    //Here is the fastest way to get the diameter >>
    public MyTestClass()
    {
        //Init objects
        vertices = new Vertex[100];
        for(int i = 0; i != vertices.Length; ++i) vertices[i] = new Vertex(i, i);
        bound = new BoundingBox(vertices[0].X, vertices[0].X, vertices[0].Y, vertices[0].Y);
        //Calculate BoundingBox
        for(int i = 0; i != vertices.Length; ++i)
        {
            bound.Left = (vertices[i].X <= bound.Left) ? vertices[i].X:bound.Left;
            bound.Right = (vertices[i].X >= bound.Right) ? vertices[i].X:bound.Right;
            bound.Bottom = (vertices[i].Y <= bound.Bottom) ? vertices[i].Y:bound.Bottom;//NOTE: If Y is faces down, then flip bottom & top comparison
            bound.Top = (vertices[i].Y >= bound.Top) ? vertices[i].Y:bound.Top;
        }
        //Messure Size of the BoundingBox
        float vecX = (bound.Right-bound.Left);
        float vecY = (bound.Top-bound.Bottom);
        diameter = (float)System.Math.Sqrt((vecX*vecX) + (vecY*vecY));
    }
}