-->

Can a `ST`-like monad be executed purely (without

2019-01-31 18:42发布

问题:

This post is literate Haskell. Just put in a file like "pad.lhs" and ghci will be able to run it.

> {-# LANGUAGE GADTs, Rank2Types #-}
> import Control.Monad
> import Control.Monad.ST
> import Data.STRef

Okay, so I was able to figure how to represent the ST monad in pure code. First we start with our reference type. Its specific value is not really important. The most important thing is that PT s a should not be isomorphic to any other type forall s. (In particular, it should be isomorphic to neither () nor Void.)

> newtype PTRef s a = Ref {unref :: s a} -- This is defined liked this to make `toST'` work. It may be given a different definition.

The kind for s is *->*, but that is not really important right now. It could be polykind, for all we care.

> data PT s a where
>     MkRef   :: a -> PT s (PTRef s a)
>     GetRef  :: PTRef s a -> PT s a
>     PutRef  :: a -> PTRef s a -> PT s ()
>     AndThen :: PT s a -> (a -> PT s b) -> PT s b

Pretty straight forward. AndThen allows us to use this as a Monad. You may be wondering how return is implemented. Here is its monad instance (it only respects monad laws with respect to runPF, to be defined later):

> instance Monad (PT s) where
>     (>>=)    = AndThen
>     return a = AndThen (MkRef a) GetRef --Sorry. I like minimalism.
> instance Functor (PT s) where
>     fmap = liftM
> instance Applicative (PT s) where
>     pure  = return
>     (<*>) = ap

Now we can define fib as a test case.

> fib :: Int -> PT s Integer
> fib n = do
>     rold <- MkRef 0
>     rnew <- MkRef 1
>     replicateM_ n $ do
>         old <- GetRef rold
>         new <- GetRef rnew
>         PutRef      new  rold
>         PutRef (old+new) rnew
>     GetRef rold

And it type checks. Hurray! Now, I was able to convert this to ST (we now see why s had to be * -> *)

> toST :: PT (STRef s) a -> ST s a
> toST (MkRef  a        ) = fmap Ref $ newSTRef a
> toST (GetRef   (Ref r)) = readSTRef r
> toST (PutRef a (Ref r)) = writeSTRef r a
> toST (pa `AndThen` apb) = (toST pa) >>= (toST . apb)

Now we can define a function to run PT without referencing ST at all:

> runPF :: (forall s. PT s a) -> a
> runPF p = runST $ toST p

runPF $ fib 7 gives 13, which is correct.


My question is can we define runPF without reference using ST at all?

Is there a pure way to define runPF? PTRef's definition is completely unimportant; it's only a placeholder type anyway. It can be redefined to whatever makes it work.

If you cannot define runPF purely, give a proof that it cannot.

Performance is not a concern (if it was, I would not have made every return have its own ref).

I'm thinking that existential types may be useful.

Note: It's trivial if we assume is a is dynamicable or something. I'm looking for an answer that works with all a.

Note: In fact, an answer does not even necessarily have much to do with PT. It just needs to be as powerful as ST without using magic. (Conversion from (forall s. PT s) is sort of a test of if an answer is valid or not.)

回答1:

tl;dr: It's not possible without adjustments to the definition of PT. Here's the core problem: you'll be running your stateful computation in the context of some sort of storage medium, but said storage medium has to know how to store arbitrary types. This isn't possible without packaging up some sort of evidence into the MkRef constructor - either an existentially wrapped Typeable dictionary as others have suggested, or a proof that the value belongs to one of a known finite set of types.

For a first attempt, let's try using a list as the storage medium and integers to refer to elements of the list.

newtype Ix a = MkIx Int  -- the index of an element in a list

interp :: PT Ix a -> State [b] a
interp (MkRef x) = modify (++ [x]) >> gets (Ref . MkIx . length)
-- ...

When storing a new item in the environment, we make sure to add it to the end of the list, so that Refs we've previously given out stay pointing at the correct element.

This ain't right. I can make a reference to any type a, but the type of interp says that the storage medium is a homogeneous list of bs. GHC has us bang to rights when it rejects this type signature, complaining that it can't match b with the type of the thing inside MkRef.


Undeterred, let us have a go at using a heterogeneous list as the environment for the State monad in which we'll interpret PT.

infixr 4 :>
data Tuple as where
    E :: Tuple '[]
    (:>) :: a -> Tuple as -> Tuple (a ': as)

This is one of my personal favourite Haskell data types. It's an extensible tuple indexed by a list of the types of the things inside it. Tuples are heterogeneous linked lists with type-level information about the types of the things inside it. (It's often called HList following Kiselyov's paper but I prefer Tuple.) When you add something to the front of a tuple, you add its type to the front of the list of types. In a poetic mood, I once put it this way: "The tuple and its type grow together, like a vine creeping up a bamboo plant."

Examples of Tuples:

ghci> :t 'x' :> E
'x' :> E :: Tuple '[Char]
ghci> :t "hello" :> True :> E
"hello" :> True :> E :: Tuple '[[Char], Bool]

What do references to values inside tuples look like? We have to prove to GHC that the type of the thing we're getting out of the tuple is indeed the type we expect.

data Elem as a where  -- order of indices arranged for convenient partial application
    Here :: Elem (a ': as) a
    There :: Elem as a -> Elem (b ': as) a

The definition of Elem is structurally that of the natural numbers (Elem values like There (There Here) look similar to natural numbers like S (S Z)) but with extra types - in this case, proving that the type a is in the type-level list as. I mention this because it's suggestive: Nats make good list indices, and likewise Elem is useful for indexing into a tuple. In this respect it'll be useful as a replacement for the Int inside our reference type.

(!) :: Tuple as -> Elem as a -> a
(x :> xs) ! Here = x
(x :> xs) ! (There ix) = xs ! ix

We need a couple of functions to work with tuples and indices.

type family as :++: bs where
    '[] :++: bs = bs
    (a ': as) :++: bs = a ': (as :++: bs)

appendT :: a -> Tuple as -> (Tuple (as :++: '[a]), Elem (as :++: '[a]) a)
appendT x E = (x :> E, Here)
appendT x (y :> ys) = let (t, ix) = appendT x ys
                      in (y :> t, There ix)

Let's try and write an interpreter for a PT in a Tuple environment.

interp :: PT (Elem as) a -> State (Tuple as) a
interp (MkRef x) = do
    t <- get
    let (newT, el) = appendT x t
    put newT
    return el
-- ...

No can do, buster. The problem is that the type of the Tuple in the environment changes when we obtain a new reference. As I mentioned before, adding something to a tuple adds its type to the tuple's type, a fact belied by the type State (Tuple as) a. GHC's not fooled by this attempted subterfuge: Could not deduce (as ~ (as :++: '[a1])).


This is where the wheels come off, as far as I can tell. What you really want to do is keep the size of the tuple constant throughout a PT computation. This would require you to index PT itself by the list of types to which you can obtain references, proving every time you do so that you're allowed to (by giving an Elem value). The environment would then look like a tuple of lists, and a reference would consist an Elem (to select the right list) and an Int (to find the particular item in the list).

This plan breaks the rules, of course (you need to change the definition of PT), but it also has engineering problems. When I call MkRef, the onus is on me to give an Elem for the value I'm making a reference to, which is pretty tedious. (That said, you can usually convince GHC to find Elem values by proof search using a hacky type class.)

Another thing: composing PTs becomes difficult. All the parts of your computation have to be indexed by the same list of types. You could attempt to introduce combinators or classes which allow you to grow the environment of a PT, but you'd also have to update all the references when you do that. Using the monad would be quite difficult.

A possibly-cleaner implementation would allow the list of types in a PT to vary as you walk around the datatype: every time you encounter a MkRef the type gets one longer. Because the type of the computation changes as it progresses, you can't use a regular monad - you have to resort to IxMonad . If you want to know what that program looks like, see my other answer.

Ultimately, the sticking point is that the type of the tuple is determined by the value of the PT request. The environment is what a given request decides to store in it. interp doesn't get to choose what's in the tuple, it must come from an index on PT. Any attempt to cheat that requirement is going to crash and burn. Now, in a true dependently-typed system we could examine the PT value we were given and figure out what as should be. Alas, Haskell is not a dependently-typed system.



回答2:

A simple solution is to wrap a State monad and present the same API as ST. In this case there's no need to store runtime type information, since it can be determined from the type of STRef-s, and the usual ST s quantification trick lets us prevent users from messing up the container storing the references.

We keep ref-s in an IntMap and increment a counter each time we allocate a new ref. Reading and writing just modifies the IntMap with some unsafeCoerce sprinkled atop.

{-# LANGUAGE DeriveFunctor, GeneralizedNewtypeDeriving, RankNTypes, RoleAnnotations #-}

module PureST (ST, STRef, newSTRef, readSTRef, modifySTRef, runST) where

import Data.IntMap (IntMap, (!))
import qualified Data.IntMap as M

import Control.Monad
import Control.Applicative
import Control.Monad.Trans.State
import GHC.Prim (Any)
import Unsafe.Coerce (unsafeCoerce)

type role ST nominal representational
type role STRef nominal representational
newtype ST s a = ST (State (IntMap Any, Int) a) deriving (Functor, Applicative, Monad)
newtype STRef s a = STRef Int deriving Show

newSTRef :: a -> ST s (STRef s a)
newSTRef a = ST $ do
  (m, i) <- get
  put (M.insert i (unsafeCoerce a) m, i + 1)
  pure (STRef i)

readSTRef :: STRef s a -> ST s a
readSTRef (STRef i) = ST $ do
  (m, _) <- get
  pure (unsafeCoerce (m ! i))

writeSTRef :: STRef s a -> a -> ST s ()
writeSTRef (STRef i) a = ST $ 
  modify $ \(m, i') -> (M.insert i (unsafeCoerce a) m, i')

modifySTRef :: STRef s a -> (a -> a) -> ST s ()
modifySTRef (STRef i) f = ST $
  modify $ \(m, i') -> (M.adjust (unsafeCoerce f) i m, i')                      

runST :: (forall s. ST s a) -> a
runST (ST s) = evalState s (M.empty, 0)

foo :: Num a => ST s (a, Bool)
foo = do
  a <- newSTRef 0 
  modifySTRef a (+100)
  b <- newSTRef False
  modifySTRef b not
  (,) <$> readSTRef a <*> readSTRef b

Now we can do:

> runST foo
(100, True)

But the following fails with the usual ST type error:

> runST (newSTRef True)

Of course, the above scheme never garbage collects references, instead it frees up everything on each runST call. I think a more complex system could implement multiple distinct regions, each tagged by a type parameter, and allocate/free resources in a more fine-grained manner.

Also, the use of unsafeCoerce means here that using internals directly is every bit as dangerous as using GHC.ST internals and State# directly, so we should make sure to present a safe API, and also test our internals thoroughly (or else we may get segfaults in Haskell, a great sin).



回答3:

Since I posted my earlier answer, you've indicated that you don't mind making changes to your definition of PT. I am happy to report: relaxing that restriction changes the answer to your question from no to yes! I've already argued that you need to index your monad by the set of types in your storage medium, so here's some working code showing how to do that. (I originally had this as an edit to my previous answer but it got too long, so here we are.)

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE RebindableSyntax #-}
{-# LANGUAGE TypeOperators #-}

import Prelude

We're going to need a smarter Monad class than the one in the Prelude: that of indexed monad-like things describing paths through a directed graph. For reasons that should become apparent, I'm going to define indexed functors as well.

class FunctorIx f where
    imap :: (a -> b) -> f i j a -> f i j b

class FunctorIx m => MonadIx m where
    ireturn :: a -> m i i a
    (>>>=) :: m i j a -> (a -> m j k b) -> m i k b

(>>>) :: MonadIx m => m i j a -> m j k b -> m i k b
ma >>> mb = ma >>>= \_ -> mb

replicateM_ :: MonadIx m => Int -> m i i a -> m i i ()
replicateM_ 0 _ = ireturn ()
replicateM_ n m = m >>> replicateM_ (n - 1) m

An indexed monad uses the type system to track the progress of a stateful computation. m i j a is a monadic computation which requires an input state of i, changes the state to j, and produces a value of type a. Sequencing indexed monads with >>>= is like playing dominoes. You can feed a computation which takes the state from i to j into a computation which goes from j to k, and get a bigger computation from i to k. (There's a richer version of this indexed monad described in Kleisli Arrows of Outrageous Fortune (and elsewhere) but this one is quite enough for our purposes.)

One possibility with MonadIx is a File monad which tracks the state of a file handle, ensuring you don't forget to free resources. fOpen :: File Closed Open () starts with a closed file and opens it, fRead :: File Open Open String returns the contents of an opened file, and fClose :: File Open Closed () takes a file from open to closed. The run operation takes a computation of type File Closed Closed a, which ensures that your file handles always get cleaned up.

But I digress: here we are concerned not with a file handle but with a set of typed "memory locations"; the types of the things in the virtual machine's memory bank are what we'll use for the monad's indices. I like to get my "program/interpreter" monads for free because it expresses the fact that results live at the leaves of a computation, and promotes composability and code reuse, so here's the functor which will produce PT when we plug it into FreeIx below:

data PTF ref as bs r where
    MkRef_ :: a -> (ref (a ': as) a -> r) -> PTF ref as (a ': as) r
    GetRef_ :: ref as a -> (a -> r) -> PTF ref as as r
    PutRef_ :: a -> ref as a -> r -> PTF ref as as r

instance FunctorIx (PTF ref) where
    imap f (MkRef_ x next) = MkRef_ x (f . next)
    imap f (GetRef_ ref next) = GetRef_ ref (f . next)
    imap f (PutRef_ x ref next) = PutRef_ x ref (f next)

PTF is parameterised by the type of reference ref :: [*] -> * -> * - references are allowed to know which types are in the system - and indexed by the list of types being stored in the interpreter's "memory". The interesting case is MkRef_: making a new reference adds a value of type a to the memory, taking as to a ': as; the continuation expects a ref in the extended environment. The other operations don't change the list of types in the system.

When I create references sequentially (x <- mkRef 1; y <- mkRef 2), they'll have different types: the first will be a ref (a ': as) a and the second will be a ref (b ': a ': as) b. To make the types line up, I need a way to use a reference in a bigger environment than the one it was created in. In general, this operation depends on the type of reference, so I'll put it in a class.

class Expand ref where
    expand :: ref as a -> ref (b ': as) a

One possible generalisation of this class would wrap up the pattern of repeated applications of expand, with a type like inflate :: ref as a -> ref (bs :++: as) a.

Here's another reusable bit of infrastructure, the indexed free monad I mentioned earlier. FreeIx turns an indexed functor into an indexed monad by providing a type-aligned joining operation Free, which ties the recursive knot in the functor's parameter, and a do-nothing operation Pure.

data FreeIx f i j a where
    Pure :: a -> FreeIx f i i a
    Free :: f i j (FreeIx f j k a) -> FreeIx f i k a

lift :: FunctorIx f => f i j a -> FreeIx f i j a
lift f = Free (imap Pure f)

instance FunctorIx f => MonadIx (FreeIx f) where
    ireturn = Pure
    Pure x >>>= f = f x
    Free love {- , man -} >>>= f = Free $ imap (>>>= f) love

instance FunctorIx f => FunctorIx (FreeIx f) where
    imap f x = x >>>= (ireturn . f)

One disadvantage of free monads is the boilerplate you have to write to make Free and Pure easier to work with. Here are some single-action PTs which form the basis of the monad's API, and some pattern synonyms to hide the Free constructors when we unpack PT values.

type PT ref = FreeIx (PTF ref)

mkRef :: a -> PT ref as (a ': as) (ref (a ': as) a)
mkRef x = lift $ MkRef_ x id

getRef :: ref as a -> PT ref as as a
getRef ref = lift $ GetRef_ ref id

putRef :: a -> ref as a -> PT ref as as ()
putRef x ref = lift $ PutRef_ x ref ()

pattern MkRef x next = Free (MkRef_ x next)
pattern GetRef ref next = Free (GetRef_ ref next)
pattern PutRef x ref next = Free (PutRef_ x ref next)

That's everything we need to be able to write PT computations. Here's your fib example. I'm using RebindableSyntax and locally redefining the monad operators (to their indexed equivalents) so I can use do notation on my indexed monad.

-- fib adds two Ints to an arbitrary environment
fib :: Expand ref => Int -> PT ref as (Int ': Int ': as) Int
fib n = do
    rold' <- mkRef 0
    rnew <- mkRef 1
    let rold = expand rold'
    replicateM_ n $ do
        old <- getRef rold
        new <- getRef rnew
        putRef new rold
        putRef (old+new) rnew
    getRef rold
        where (>>=) = (>>>=)
              (>>) = (>>>)
              return :: MonadIx m => a -> m i i a
              return = ireturn
              fail :: MonadIx m => String -> m i j a
              fail = error

This version of fib looks just like the one you wanted to write in the original question. The only difference (apart from the local bindings of >>= and so on) is the call to expand. Every time you create a new reference, you have to expand all the old ones, which is a bit tedious.

Finally we can finish the job we set out to do and build a PT-machine which uses a Tuple as the storage medium and Elem as the reference type.

infixr 5 :>
data Tuple as where
    E :: Tuple '[]
    (:>) :: a -> Tuple as -> Tuple (a ': as)

data Elem as a where
    Here :: Elem (a ': as) a
    There :: Elem as a -> Elem (b ': as) a

(!) :: Tuple as -> Elem as a -> a
(x :> xs) ! Here = x
(x :> xs) ! There ix = xs ! ix

updateT :: Elem as a -> a -> Tuple as -> Tuple as
updateT Here x (y :> ys) = x :> ys
updateT (There ix) x (y :> ys) = y :> updateT ix x ys

To use an Elem in a larger tuple than the one you built it for, you just need to make it look further down the list.

instance Expand Elem where
    expand = There

Note that this deployment of Elem is rather like a de Bruijn index: more-recently-bound variables have smaller indices.

interp :: PT Elem as bs a -> Tuple as -> a
interp (MkRef x next) tup = let newTup = x :> tup
                            in interp (next $ Here) newTup
interp (GetRef ix next) tup = let x = tup ! ix
                              in interp (next x) tup
interp (PutRef x ix next) tup = let newTup = updateT ix x tup
                                in interp next newTup
interp (Pure x) tup = x

When the interpreter encounters a MkRef request, it increases the size of its memory by adding x to the front. The type checker will remind you that any refs from before the MkRef must be correctly expanded, so existing references don't get out of whack when the tuple changes size. We paid for an interpreter without unsafe casts, but we got referential integrity to boot.

Running from a standing start requires that the PT computation expects to begin with an empty memory bank, but we allow it to end in any state.

run :: (forall ref. Expand ref => PT ref '[] bs a) -> a
run x = interp x E

It typechecks, but does it work?

ghci> run (fib 5)
5
ghci> run (fib 3)
2