I'm currently following Steve Yegge's advice on preparing for a technical programming interview: http://steve-yegge.blogspot.com/2008/03/get-that-job-at-google.html
In his section on Graphs, he states:
There are three basic ways to
represent a graph in memory (objects
and pointers, matrix, and adjacency
list), and you should familiarize
yourself with each representation and
its pros and cons.
The pros and cons of matrix and adjacency list representations are described in CLRS, but I haven't been able to find a resource that compares these to an object representation.
Just by thinking about it, I can infer some of this myself, but I'd like to make sure I haven't missed something important. If someone could describe this comprehensively, or point me to a resource which does so, I would greatly appreciate it.
objects and pointers
These are just basic datastructures like hammar said in the other answer, in Java
you would represent this with classes like edges and vertices. For example an edge connects two vertices and can either be directed or undirected and it can contain a weight. A vertex can have an ID, name etc. Mostly both of them have additional properties. So you can construct your graph with them like
Vertex a = new Vertex(1);
Vertex b = new Vertex(2);
Edge edge = new Edge(a,b, 30); // init an edge between ab and be with weight 30
This approach is commonly used for object oriented implementations, since it is more readable and convenient for object oriented users ;).
matrix
A matrix is just a simple 2 dimensional array. Assuming you have vertex ID's that can be represented as an int array like this:
int[][] adjacencyMatrix = new int[SIZE][SIZE]; // SIZE is the number of vertices in our graph
adjacencyMatrix[0][1] = 30; // sets the weight of a vertex 0 that is adjacent to vertex 1
This is commonly used for dense graphs where index access is necessary. You can represent a un/directed and weighted structure with this.
adjacency list
This is just a simple datastructure mix, I usually implement this using a HashMap<Vertex, List<Vertex>>
. Similar used can be the HashMultimap
in Guava.
This approach is cool, because you have O(1) (amortized) vertex lookup and it returns me a list of all adjacent vertices to this particular vertex I demanded.
ArrayList<Vertex> list = new ArrayList<>();
list.add(new Vertex(2));
list.add(new Vertex(3));
map.put(new Vertex(1), list); // vertex 1 is adjacent to 2 and 3
This is used for representing sparse graphs, if you are applying at Google, you should know that the webgraph is sparse. You can deal with them in a more scalable way using a BigTable.
Oh and BTW, here is a very good summary of this post with fancy pictures ;)
Objects and pointers is mostly the same as adjacency list, at least for the purpose of comparing algorithms that use these representations.
Compare
struct Node {
Node *neighbours[];
};
with
struct Node {
Node *left;
Node *right;
};
You can easily construct the list of neighbours on-the-fly in the latter case, if it is easier to work with than named pointers.
Advantage of the object representation (incidence list) is that two adjacent vertices share the same instance of the edge. This makes it easy to manipulate with undirected edge data (length, cost, flow or even direction). However it uses extra memory for pointers.
Another good resource: Khan Academy - "Representing Graphs"
Besides adjacency list and adjacency matrix, they list "edge lists" as a 3rd type of graph representation. An edge list could be interpreted as a list of "edge objects" like those in Thomas's "objects and pointers" answer.
Advantage: We can store more information about the edge (mentioned by Michal)
Disadvantage: It's a very slow data structure to work with:
- Lookup an edge: O(log e)
- Remove an edge: O(e)
- Find all nodes adjacent to a given node: O(e)
- Determine whether there exists a path between two nodes: O(e^2)
e = number of edges