Unlike other unsafe* operations, the documentation for unsafeInterleaveIO
is not very clear about its possible pitfalls. So exactly when is it unsafe? I would like to know the condition for both parallel/concurrent and the single threaded usage.
More specifically, are the two functions in the following code semantically equivalent? If not, when and how?
joinIO :: IO a -> (a -> IO b) -> IO b
joinIO a f = do !x <- a
!x' <- f x
return x'
joinIO':: IO a -> (a -> IO b) -> IO b
joinIO' a f = do !x <- unsafeInterleaveIO a
!x' <- unsafeInterleaveIO $ f x
return x'
Here's how I would use this in practice:
data LIO a = LIO {runLIO :: IO a}
instance Functor LIO where
fmap f (LIO a) = LIO (fmap f a)
instance Monad LIO where
return x = LIO $ return x
a >>= f = LIO $ lazily a >>= lazily . f
where
lazily = unsafeInterleaveIO . runLIO
iterateLIO :: (a -> LIO a) -> a -> LIO [a]
iterateLIO f x = do
x' <- f x
xs <- iterateLIO f x' -- IO monad would diverge here
return $ x:xs
limitLIO :: (a -> LIO a) -> a -> (a -> a -> Bool) -> LIO a
limitLIO f a converged = do
xs <- iterateLIO f a
return . snd . head . filter (uncurry converged) $ zip xs (tail xs)
root2 = runLIO $ limitLIO newtonLIO 1 converged
where
newtonLIO x = do () <- LIO $ print x
LIO $ print "lazy io"
return $ x - f x / f' x
f x = x^2 -2
f' x = 2 * x
converged x x' = abs (x-x') < 1E-15
Although I would rather avoid using this code in serious applications because of the terrifying unsafe*
stuff, I could at least be lazier than would be possible with the stricter IO monad in deciding what 'convergence' means, leading to (what I think is) more idiomatic Haskell. And this brings up another question:why is it not the default semantics for Haskell's (or GHC's?) IO monad? I've heard some resource management issues for lazy IO (which GHC only provides by a small fixed set of commands), but the examples typically given somewhat resemble like a broken makefile:a resource X depends on a resource Y, but if you fail to specify the dependency, you get an undefined status for X. Is lazy IO really the culprit for this problem? (On the other hand, if there is a subtle concurrency bug in the above code such as deadlocks I would take it as a more fundamental problem.)
Update
Reading Ben's and Dietrich's answer and his comments below, I have briefly browsed the ghc source code to see how the IO monad is implemented in GHC. Here I summerize my few findings.
GHC implements Haskell as an impure, non-referentially-transparent language. GHC's runtime operates by successively evaluating impure functions with side effects just like any other functional languages. This is why the evaluation order matters.
unsafeInterleaveIO
is unsafe because it can introduce any kind of concurrency bugs even in a sigle-threaded program by exposing the (usually) hidden impurity of GHC's Haskell. (iteratee
seems to be a nice and elegant solution for this, and I will certainly learn how to use it.)the IO monad must be strict because a safe, lazy IO monad would require a precise (lifted) representation of the RealWorld, which seems impossible.
It's not just the IO monad and
unsafe
functions that are unsafe. The whole Haskell (as implemented by GHC) is potentially unsafe, and 'pure' functions in (GHC's) Haskell are only pure by convention and the people's goodwill. Types can never be a proof for purity.
To see this, I demonstrate how GHC's Haskell is not referentially transparent regardless of the IO monad, regardless of the unsafe*
functions,etc.
-- An evil example of a function whose result depends on a particular
-- evaluation order without reference to unsafe* functions or even
-- the IO monad.
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
{-# LANGUAGE BangPatterns #-}
import GHC.Prim
f :: Int -> Int
f x = let v = myVar 1
-- removing the strictness in the following changes the result
!x' = h v x
in g v x'
g :: MutVar# RealWorld Int -> Int -> Int
g v x = let !y = addMyVar v 1
in x * y
h :: MutVar# RealWorld Int -> Int -> Int
h v x = let !y = readMyVar v
in x + y
myVar :: Int -> MutVar# (RealWorld) Int
myVar x =
case newMutVar# x realWorld# of
(# _ , v #) -> v
readMyVar :: MutVar# (RealWorld) Int -> Int
readMyVar v =
case readMutVar# v realWorld# of
(# _ , x #) -> x
addMyVar :: MutVar# (RealWorld) Int -> Int -> Int
addMyVar v x =
case readMutVar# v realWorld# of
(# s , y #) ->
case writeMutVar# v (x+y) s of
s' -> x + y
main = print $ f 1
Just for easy reference, I collected some of the relevant definitions for the IO monad as implemented by GHC. (All the paths below are relative to the top directory of the ghc's source repository.)
-- Firstly, according to "libraries/base/GHC/IO.hs",
{-
The IO Monad is just an instance of the ST monad, where the state is
the real world. We use the exception mechanism (in GHC.Exception) to
implement IO exceptions.
...
-}
-- And indeed in "libraries/ghc-prim/GHC/Types.hs", We have
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
-- And in "libraries/base/GHC/Base.lhs", we have the Monad instance for IO:
data RealWorld
instance Functor IO where
fmap f x = x >>= (return . f)
instance Monad IO where
m >> k = m >>= \ _ -> k
return = returnIO
(>>=) = bindIO
fail s = failIO s
returnIO :: a -> IO a
returnIO x = IO $ \ s -> (# s, x #)
bindIO :: IO a -> (a -> IO b) -> IO b
bindIO (IO m) k = IO $ \ s -> case m s of (# new_s, a #) -> unIO (k a) new_s
unIO :: IO a -> (State# RealWorld -> (# State# RealWorld, a #))
unIO (IO a) = a
-- Many of the unsafe* functions are defined in "libraries/base/GHC/IO.hs":
unsafePerformIO :: IO a -> a
unsafePerformIO m = unsafeDupablePerformIO (noDuplicate >> m)
unsafeDupablePerformIO :: IO a -> a
unsafeDupablePerformIO (IO m) = lazy (case m realWorld# of (# _, r #) -> r)
unsafeInterleaveIO :: IO a -> IO a
unsafeInterleaveIO m = unsafeDupableInterleaveIO (noDuplicate >> m)
unsafeDupableInterleaveIO :: IO a -> IO a
unsafeDupableInterleaveIO (IO m)
= IO ( \ s -> let
r = case m s of (# _, res #) -> res
in
(# s, r #))
noDuplicate :: IO ()
noDuplicate = IO $ \s -> case noDuplicate# s of s' -> (# s', () #)
-- The auto-generated file "libraries/ghc-prim/dist-install/build/autogen/GHC/Prim.hs"
-- list types of all the primitive impure functions. For example,
data MutVar# s a
data State# s
newMutVar# :: a -> State# s -> (# State# s,MutVar# s a #)
-- The actual implementations are found in "rts/PrimOps.cmm".
So, for example, ignoring the constructor and assuming referential transparency, we have
unsafeDupableInterleaveIO m >>= f
==> (let u = unsafeDupableInterleaveIO)
u m >>= f
==> (definition of (>>=) and ignore the constructor)
\s -> case u m s of
(# s',a' #) -> f a' s'
==> (definition of u and let snd# x = case x of (# _,r #) -> r)
\s -> case (let r = snd# (m s)
in (# s,r #)
) of
(# s',a' #) -> f a' s'
==>
\s -> let r = snd# (m s)
in
case (# s, r #) of
(# s', a' #) -> f a' s'
==>
\s -> f (snd# (m s)) s
This is not what we would normally get from binding usual lazy state monads.
Assuming the state variable s
carries some real meaning (which it does not), it looks more like a concurrent IO (or interleaved IO as the function rightly says) than a lazy IO as we would normally mean by 'lazy state monad' wherein despite the laziness the states are properly threaded by an associative operation.
I tried to implement a truely lazy IO monad, but soon realized that in order to define a lazy monadic composition for the IO datatype, we need to be able to lift/unlift the RealWorld
. However this seems impossible because there is no constructor for both State# s
and RealWorld
. And even if that were possible, I would then have to represent the precise, functional represenation of our RealWorld which is impossible,too.
But I'm still not sure whether the standard Haskell 2010 breaks referential transparency or the lazy IO is bad in itself. At least it seems entirely possible to build a small model of the RealWorld on which the lazy IO is perfectly safe and predictable. And there might be a good enough approximation that serves many practical purposes without breaking the referential transparency.