What is a good way to represent finite automaton in Haskell? How would the data type of it look like?
In our college, automata were defined as a 5-tuple
(Q, X, delta, q_0, F)
where Q is the set of automaton's states, X is the alphabet (is this part even necessery), delta is the transition function taking 2-tuple from (Q,X) and returning state/-s (in non-deterministic version) and F is the set of accepting/end states.
Most importantly, I'm not sure what type delta
should have...
There are two basic options:
- An explicit function
delta :: Q -> X -> Q
(or [Q]
as appropriate) as Sven Hager suggests.
- A map
delta :: Map (Q, X) Q
e.g. using Data.Map
, or if your states/alphabet can be indexed by ascending numbers Data.Array
or Data.Vector
.
Note that these two approaches are essentially equivalent, one can convert from the map version to a function version (this is slightly different due to an extra Maybe
from the lookup
call) relatively easily
delta_func q x = Data.Map.lookup (q,x) delta_map
(Or the appropriately curried version of the look-up function for whatever mapping type you are using.)
If you are constructing the automata at compile time (and so know the possible states and can have them encoded as a data type), then using the function version gives you better type safety, as the compiler can verify that you have covered all cases.
If you are constructing the automata at run time (e.g. from user input), then storing delta
as a map (and possibly doing the function conversion as above) and having an appropriate input validation that guarantees correctness so that fromJust
is safe (i.e. there is always an entry in the map for any possible (Q,X)
tuple and so the look-up never fails (never returns Nothing
)).
Non-deterministic automata work well with the map option, because a failed look-up is the same as having no state to go to, i.e. an empty [Q]
list, and so there doesn't need to be any special handling of the Maybe
beyond a call to join . maybeToList
(join
is from Data.Monad
and maybeToList
is from Data.Maybe
).
On a different note, the alphabet is most definitely necessary: it is how the automaton receives input.
Check out the Control.Arrow.Transformer.Automaton module in the "arrows" package. The type looks like this
newtype Automaton a b c = Automaton (a b (c, Automaton a b c))
This is a bit confusing because its an arrow transformer. In the simplest case you can write
type Auto = Automaton (->)
Which uses functions as the underlying arrow. Substituting (->) for "a" in the Automaton definition and using infix notation you can see this is roughly equivalent to:
newtype Auto b c = Automaton (b -> (c, Auto b c))
In other words an automaton is a function that takes an input and returns a result and a new automaton.
You can use this directly by writing a function for each state that takes an argument and returns the result and the next function. For instance, here is a state machine to recognise the regexp "a+b" (that is, a series of at least one 'a' followed by a 'b'). (Note: untested code)
state1, state2 :: Auto Char Bool
state1 c = if c == 'a' then (False, state2) else (False, state1)
state2 c = case c of
'a' -> (False, state2)
'b' -> (True, state1)
otherwise -> (False, state1)
In terms of your original question, Q = {state1, state2}, X = Char, delta is function application, and F is the state transition returning True (rather than having an "accepting state" I've used an output transition with an accepting value).
Alternatively you can use Arrow notation. Automaton is an instance of all the interesting arrow classes, including Loop and Circuit, so you can get access to previous values by using delay. (Note: again, untested code)
recognise :: Auto Char Bool
recognise = proc c -> do
prev <- delay 'x' -< c -- Doesn't matter what 'x' is, as long as its not 'a'.
returnA -< (prev == 'a' && c == 'b')
The "delay" arrow means that "prev" is equal to the previous value of "c" rather than the current value. You can also get access to your previous output by using "rec". For instance, here is an arrow that gives you a decaying total over time. (Actually tested in this case)
-- | Inputs are accumulated, but decay over time. Input is a (time, value) pair.
-- Output is a pair consisting
-- of the previous output decayed, and the current output.
decay :: (ArrowCircuit a) => NominalDiffTime -> a (UTCTime, Double) (Double, Double)
decay tau = proc (t2,v2) -> do
rec
(t1, v1) <- delay (t0, 0) -< (t2, v)
let
dt = fromRational $ toRational $ diffUTCTime t2 t1
v1a = v1 * exp (negate dt / tau1)
v = v1a + v2
returnA -< (v1a, v)
where
t0 = UTCTime (ModifiedJulianDay 0) (secondsToDiffTime 0)
tau1 = fromRational $ toRational tau
Note how the input to "delay" includes "v", a value derived from its output. The "rec" clause enables this, so we can build up a feedback loop.