It's easy enough to represent a tree or list in haskell using algebraic data types. But how would you go about typographically representing a graph? It seems that you need to have pointers. I'm guessing you could have something like
type Nodetag = String
type Neighbours = [Nodetag]
data Node a = Node a Nodetag Neighbours
And that would be workable. However it feels a bit decoupled; The links between different nodes in the structure don't really "feel" as solid as the links between the current previous and next elements in a list, or the parents and children of a node in a tree. I have a hunch that doing algebraic manipulations on the graph as I defined it would be somewhat hindered by the level of indirection introduced through the tag system.
It is primarily this feeling of doubt and perception of inelegance that causes me to ask this question. Is there a better/more mathematically elegant way of defining graphs in Haskell? Or have I stumbled upon something inherently hard/fundamental? Recursive data structures are sweet, but this seems to be something else. A self referential data structure in a different sense to how trees and lists are self referential. It's like lists and trees are self referential at the type level, but graphs are self referential at the value level.
So what's really going on?
It's true, graphs are not algebraic. To deal with this problem, you have a couple of options:
Int
s) and referring to them indirectly rather than algebraically. This can be made significantly more convenient by making the type abstract and providing an interface that juggles the indirection for you. This is the approach taken by, e.g., fgl and other practical graph libraries on Hackage.So there are pros and cons to each of the above choices. Pick the one that seems best for you.
I always liked Martin Erwig's approach in "Inductive Graphs and Functional Graph Algorithms", which you can read here. FWIW, I once wrote a Scala implementation as well, see https://github.com/nicolast/scalagraphs.
As Ben mentioned, cyclic data in Haskell is constructed by a mechanism called "tying the knot". In practice, it means that we write mutually recursive declarations using
let
orwhere
clauses, which works because the mutually recursive parts are lazily evaluated.Here's an example graph type:
As you can see, we use actual
Node
references instead of indirection. Here's how to implement a function that constructs the graph from a list of label associations.We take in a list of
(nodeLabel, [adjacentLabel])
pairs and construct the actualNode
values via an intermediate lookup-list (which does the actual knot-tying). The trick is thatnodeLookupList
(which has the type[(a, Node a)]
) is constructed usingmkNode
, which in turn refers back to thenodeLookupList
to find the adjacent nodes.I also find it awkward to try to represent data structures with cycles in a pure language. It's the cycles that are really the problem; because values can be shared any ADT that can contain a member of the type (including lists and trees) is really a DAG (Directed Acyclic Graph). The fundamental issue is that if you have values A and B, with A containing B and B containing A, then neither can be created before the other exists. Because Haskell is lazy you can use a trick known as Tying the Knot to get around this, but that makes my brain hurt (because I haven't done much of it yet). I've done more of my substantial programming in Mercury than Haskell so far, and Mercury is strict so knot-tying doesn't help.
Usually when I've run into this before I've just resorted to additional indirection, as you're suggesting; often by using a map from ids to the actual elements, and having elements contain references to the ids instead of to other elements. The main thing I didn't like about doing that (aside from the obvious inefficiency) is that it felt more fragile, introducing the possible errors of looking up an id that doesn't exist or trying to assign the same id to more than one element. You can write code so that these errors won't occur, of course, and even hide it behind abstractions so that the only places where such errors could occur are bounded. But it's still one more thing to get wrong.
However, a quick google for "Haskell graph" led me to http://www.haskell.org/haskellwiki/The_Monad.Reader/Issue5/Practical_Graph_Handling, which looks like a worthwhile read.
Any discussion of representing graphs in Haskell needs a mention of Andy Gill's data-reify library (here is the paper).
The "tying-the-knot" style representation can be used to make very elegant DSLs (see example below). However, the data structure is of limited use. Gill's library allows you the best of both worlds. You can use a "tying the knot" DSL, but then convert the pointer-based graph into a label-based graph so you can run your algorithms of choice on it.
Here is a simple example:
To run the above code you will need the following definitions:
I want to stress that this is a simplistic DSL, but the sky's the limit! I designed a very featureful DSL, including a nice tree-like syntax for having a node broadcast an initial value to some of its children, and many convenience functions for constructing specific node types. Of course, the Node data type and mapDeRef definitions were much more involved.
I like this implementation of a graph taken from here