Tying the Knot with a State monad

2019-03-08 09:55发布

问题:

I'm working on a Haskell project that involves tying a big knot: I'm parsing a serialized representation of a graph, where each node is at some offset into the file, and may reference another node by its offset. So I need to build up a map from offsets to nodes while parsing, which I can feed back to myself in a do rec block.

I have this working, and kinda-sorta-reasonably abstracted into a StateT-esque monad transformer:

{-# LANGUAGE DoRec, GeneralizedNewtypeDeriving #-}

import qualified Control.Monad.State as S

data Knot s = Knot { past :: s, future :: s }

newtype RecStateT s m a = RecStateT (S.StateT (Knot s) m a) deriving
  ( Alternative
  , Applicative
  , Functor
  , Monad
  , MonadCont
  , MonadError e
  , MonadFix
  , MonadIO
  , MonadPlus
  , MonadReader r
  , MonadTrans
  , MonadWriter w )

runRecStateT :: RecStateT s m a -> Knot s -> m (a, Knot s)
runRecStateT (RecStateT st) = S.runStateT st

tie :: MonadFix m => RecStateT s m a -> s -> m (a, s)
tie m s = do
  rec (a, Knot s' _) <- runRecStateT m (Knot s s')
  return (a, s')

get :: Monad m => RecStateT s m (Knot s)
get = RecStateT S.get

put :: Monad m => s -> RecStateT s m ()
put s = RecStateT $ S.modify $ \ ~(Knot _ s') -> Knot s s'

The tie function is where the magic happens: the call to runRecStateT produces a value and a state, which I feed it as its own future. Note that get allows you to read from both the past and future states, but put only allows you to modify the "present."

Question 1: Does this seem like a decent way to implement this knot-tying pattern in general? Or better still, has somebody implemented a general solution to this, that I overlooked when snooping through Hackage? I beat my head against the Cont monad for a while, since it seemed possibly more elegant (see similar post from Dan Burton), but I just couldn't work it out.

Totally subjective Question 2: I'm not totally thrilled with the way my calling code ends up looking:

do
  Knot past future <- get
  let {- ... -} = past
      {- ... -} = future
      node = {- ... -}
  put $ {- ... -}
  return node

Implementation details here omitted, obviously, the important point being that I have to get the past and future state, pattern-match them inside a let binding (or explicitly make the previous pattern lazy) to extract whatever I care about, then build my node, update my state and finally return the node. Seems unnecessarily verbose, and I particularly dislike how easy it is to accidentally make the pattern that extracts the past and future states strict. So, can anybody think of a nicer interface?

回答1:

I've been playing around with stuff, and I think I've come up with something... interesting. I call it the "Seer" monad, and it provides (aside from Monad operations) two primitive operations:

see  :: Monoid s => Seer s s
send :: Monoid s => s -> Seer s ()

and a run operation:

runSeer :: Monoid s => Seer s a -> a

The way this monad works is that see allows a seer to see everything, and send allows a seer to "send" information to all other seers for them to see. Whenever any seer performs the see operation, they are able to see all of the information that has been sent, and all of the information that will be sent. In other words, within a given run, see will always produce the same result no matter where or when you call it. Another way of saying it is that see is how you get a working reference to the "tied" knot.

This is actually very similar to just using fix, except that all of the sub-parts are added incrementally and implicitly, rather than explicitly. Obviously, seers will not work correctly in the presence of a paradox, and sufficient laziness is required. For example, see >>= send may cause an explosion of information, trapping you in a time loop.

A dumb example:

import Control.Seer
import qualified Data.Map as M
import Data.Map (Map, (!))

bar :: Seer (Map Int Char) String
bar = do
  m <- see
  send (M.singleton 1 $ succ (m ! 2))
  send (M.singleton 2 'c')
  return [m ! 1, m ! 2]

As I said, I've just been toying around, so I have no idea if this is any better than what you've got, or if it's any good at all! But it's nifty, and relevant, and if your "knot" state is a Monoid, then it just might be useful to you. Fair warning: I built Seer by using a Tardis.

https://github.com/DanBurton/tardis/blob/master/Control/Seer.hs



回答2:

I wrote up an article on this topic at entitled Assembly: Circular Programming with Recursive do where I describe two methods for building an assembler using knot tying. Like your problem, an assembler has to be able to resolve address of labels that may occur later in the file.



回答3:

Regarding the implementation, I would make it a composition of a Reader monad (for the future) and a State monad (for past/present). The reason is that you set your future only once (in tie) and then don't change it.

{-# LANGUAGE DoRec, GeneralizedNewtypeDeriving #-}

import Control.Monad.State
import Control.Monad.Reader
import Control.Applicative

newtype RecStateT s m a = RecStateT (StateT s (ReaderT s m) a) deriving
  ( Alternative
  , Applicative
  , Functor
  , Monad
  , MonadPlus
  )

tie :: MonadFix m => RecStateT s m a -> s -> m (a, s)
tie (RecStateT m) s = do
  rec (a, s') <- flip runReaderT s' $ flip runStateT s m
  return (a, s')

getPast :: Monad m => RecStateT s m s
getPast = RecStateT get

getFuture :: Monad m => RecStateT s m s
getFuture = RecStateT ask

putPresent :: Monad m => s -> RecStateT s m ()
putPresent = RecStateT . put

Regarding your second question, it'd help to know your dataflow (i.e. to have a minimal example of your code). It's not true that strict patterns always lead to loops. It's true that you need to be careful so as not to create a non-producing loop, but the exact restrictions depend on what and how you're building.



回答4:

I'm kind of overwhelmed by the amount of Monad usage. I might not understand the past/future things, but I guess you are just trying to express the lazy+fixpoint binding. (Correct me if I'm wrong.) The RWS Monad usage with R=W is kind of funny, but you do not need the State and the loop, when you can do the same with fmap. There is no point in using Monads if they do not make things easier. (Only very few Monads represent chronological order, anyway.)

My general solution to tying the knot:

  1. I parse everything to a List of nodes,
  2. convert that list to a Data.Vector for O(1) access to boxed (=lazy) values,
  3. bind that result to a name using let or the fix or mfix function,
  4. and access that named Vector inside the parser. (see 1.)

That example solution in your blog, where you write sth. like this:

data Node = Node {
  value :: Int,
  next  :: Node
} deriving Show
…
tie = …
parse = …
data ParserState = …
…
example :: Node
example =
  let (_, _, m) = tie parse $ ParserState 0 [(0, 1), (1, 2), (2, 0)]
  in (m Map.! 0)

I would have written this way:

{-# LANGUAGE ViewPatterns, NamedFieldPuns #-}
import Data.Vector as Vector

example :: Node
example =
   let node :: Int -> Node
       node = (Vector.!) $ Vector.fromList $
                   [ Node{value,next}
                   | (value,node->next) <- [(0, 1), (1, 2), (2, 0)]
                   ]
   in (node 0)

or shorter:

{-# LANGUAGE ViewPatterns, NamedFieldPuns #-}
import Data.Vector as Vector

example :: Node
example = (\node->(Vector.fromList[ Node{value,next}
                                  | (value,node->next) <- [(0, 1), (1, 2), (2, 0)]
                                  ] Vector.!)) `fix` 0


回答5:

I had a similar problem recently, but I chose a different approach. A recursive data structure can be represented as a type fixed point on a data type functor. Loading data can be then split into two parts:

  • Load the data into a structure that references other nodes only by some kind of identifier. In the example it's Loader Int (NodeF Int), which constructs a map of values of type NodeF Int Int.
  • Tie the knot by creating a recursive data structure by replacing the identifiers with actual data. In the example the resulting data structures have type Fix (NodeF Int), and they are later converted to Node Int for convenience.

It's lacking a proper error handling etc., but the idea should be clear from that.

-- Public Domain

import Control.Monad
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Maybe (fromJust)

-- Fixed point operator on types and catamohism/anamorphism methods
-- for constructing/deconstructing them:

newtype Fix f = Fix { unfix :: f (Fix f) }

catam :: Functor f => (f a -> a) -> (Fix f -> a)
catam f = f . fmap (catam f) . unfix

anam :: Functor f => (a -> f a) -> (a -> Fix f)
anam f = Fix . fmap (anam f) . f

anam' :: Functor f => (a -> f a) -> (f a -> Fix f)
anam' f = Fix . fmap (anam f)

-- The loader itself

-- A representation of a loader. Type parameter 'k' represents the keys by
-- which the nodes are represented. Type parameter 'v' represents a functor
-- data type representing the values.
data Loader k v = Loader (Map k (v k))

-- | Creates an empty loader.
empty :: Loader k v
empty = Loader $ Map.empty

-- | Adds a new node into a loader.
update :: (Ord k) => k -> v k -> Loader k v -> Loader k v
update k v = update' k (const v)

-- | Modifies a node in a loader.
update' :: (Ord k) => k -> (Maybe (v k) -> (v k)) -> Loader k v -> Loader k v
update' k f (Loader m) = Loader $ Map.insertWith (const (f . Just)) k (f Nothing) $ m

-- | Does the actual knot-tying. Creates a new data structure
-- where the references to nodes are replaced by the actual data.
tie :: (Ord k, Functor v) => Loader k v -> Map k (Fix v)
tie (Loader m) = Map.map (anam' $ \k -> fromJust (Map.lookup k m)) m


-- -----------------------------------------------------------------
-- Usage example:

data NodeF n t = NodeF n [t]
instance Functor (NodeF n) where
    fmap f (NodeF n xs) = NodeF n (map f xs)

-- A data structure isomorphic to Fix (NodeF n), but easier to work with.
data Node n = Node n [Node n]
  deriving Show
-- The isomorphism that does the conversion.
nodeunfix :: Fix (NodeF n) -> Node n
nodeunfix = catam (\(NodeF n ts) -> Node n ts)

main :: IO ()
main = do
    -- Each node description consist of an integer ID and a list of other nodes
    -- it references.
    let lss = 
            [ (1, [4])
            , (2, [1])
            , (3, [2, 1])
            , (4, [3, 2, 1])
            , (5, [5])
            ]
    print lss
    -- Fill a new loader with the data:
    let
        loader = foldr f empty lss
        f (label, dependsOn) = update label (NodeF label dependsOn)
    -- Tie the knot:
    let tied' = tie loader
    -- And convert Fix (NodeF n) into Node n:
    let tied = Map.map nodeunfix tied'

    -- For each node print the label of the first node it references
    -- and the count of all referenced nodes.
    print $ Map.map (\(Node n ls@((Node n1 _) : _)) -> (n1, length ls)) tied