To get acquainted with unsafePerformIO
(how to use it and when to use it), I've implemented a module for generating unique values.
Here's what I have:
module Unique (newUnique) where
import Data.IORef
import System.IO.Unsafe (unsafePerformIO)
-- Type to represent a unique thing.
-- Show is derived just for testing purposes.
newtype Unique = U Integer
deriving Show
-- I believe this is the Haskell'98 derived instance, but
-- I want to be explicit, since its Eq instance is the most
-- important part of Unique.
instance Eq Unique where
(U x) == (U y) = x == y
counter :: IORef Integer
counter = unsafePerformIO $ newIORef 0
updateCounter :: IO ()
updateCounter = do
x <- readIORef counter
writeIORef counter (x+1)
readCounter :: IO Integer
readCounter = readIORef counter
newUnique' :: IO Unique
newUnique' = do { x <- readIORef counter
; writeIORef counter (x+1)
; return $ U x }
newUnique :: () -> Unique
newUnique () = unsafePerformIO newUnique'
To my delight, the package called Data.Unique
chose the same datatype as I did; on the other hand, they chose the type newUnique :: IO Unique
, but I want to stay out of IO
if possible.
Is this implementation dangerous? Could it possibly lead GHC to change the semantics of a program which uses it?
Treat unsafePerformIO
as a promise to the compiler. It says "I promise that you can treat this IO action as if it were a pure value and nothing will go wrong". It's useful because there are times you can build a pure interface to a computation implemented with impure operations, but it's impossible for the compiler to verify when this is the case; instead unsafePerformIO
allows you to put your hand on your heart and swear that you have verified that the impure computation is actually pure, so the compiler can simply trust that it is.
In this case that promise is false. If newUnique
were a pure function then let x = newUnique () in (x, x)
and (newUnique (), newUnique ())
would be equivalent expressions. But you would want these two expressions to have different results; a pair of duplicates of the same Unique
value in one case, and a pair of two different Unique
values in the other. With your code, there's really no way to say what either expression means. They can only be understood by considering the actual sequence of operations the program will carry out at runtime, and control over that is exactly what you're relinquishing when you use unsafePerformIO
. unsafePerformIO
says it doesn't matter whether either expression is compiled as one execution of newUnique
or two, and any implementation of Haskell is free to choose whatever it likes each and every time it encounters such code.
The purpose of unsafePerformIO
is when your function does some action internally, but has no side effects that an observer would notice. For example, a function that take a vector, copies it, quicksorts the copy in-place, then returns the copy. (see comments) Each of these operations has side effects, and so is in IO
, but the overall result does not.
newUnique
must be an IO
action because it generates something different every time. This is basically the definition of IO
, it means a verb, as opposed to functions which are adjectives. A function will always return the same result for the same arguments. This is called referential transparency.
For valid uses of unsafePerformIO
, see this question.
Yes, your module is dangerous. Consider this example:
module Main where
import Unique
main = do
print $ newUnique ()
print $ newUnique ()
Compile and run:
$ ghc Main.hs
$ ./Main
U 0
U 1
Compile with optimization and run:
$ \rm *.{hi,o}
$ ghc -O Main.hs
$ ./Main
U 0
U 0
Uh-oh!
Adding {-# NOINLINE counter #-}
and {-# NOINLINE newUnique #-}
does not help, so I'm not actually sure what's happening here ...
1st UPDATE
Looking at the GHC core, I see that @LambdaFairy was correct that
constant subexpression elimination (CSE) caused my newUnique ()
expressions to be lifted. However, preventing CSE with -fno-cse
and
adding {-# NOINLINE counter #-}
to Unique.hs
is not sufficient to
make the optimized program print the same as the unoptimized program!
In particular, it seems that counter
is inlined even with the
NOINLINE
pragma in Unique.hs
. Does anyone understand why?
I've uploaded the full versions of the following core files at
https://gist.github.com/ntc2/6986500.
The (relevant) core for main
when compiling with -O
:
main3 :: Unique.Unique
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, Cheap=False, Expandable=False,
Guidance=IF_ARGS [] 20 0}]
main3 = Unique.newUnique ()
main2 :: [Char]
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, Cheap=False, Expandable=False,
Guidance=IF_ARGS [] 40 0}]
main2 =
Unique.$w$cshowsPrec 0 main3 ([] @ Char)
main4 :: [Char]
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, Cheap=False, Expandable=False,
Guidance=IF_ARGS [] 40 0}]
main4 =
Unique.$w$cshowsPrec 0 main3 ([] @ Char)
main1
:: State# RealWorld
-> (# State# RealWorld, () #)
[GblId,
Arity=1,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [0] 110 0}]
main1 =
\ (eta_B1 :: State# RealWorld) ->
case Handle.Text.hPutStr2
Handle.FD.stdout main4 True eta_B1
of _ { (# new_s_atQ, _ #) ->
Handle.Text.hPutStr2
Handle.FD.stdout main2 True new_s_atQ
}
Note that the newUnique ()
calls have been lifted and bound to
main3
.
And now when compiling with -O -fno-cse
:
main3 :: Unique.Unique
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, Cheap=False, Expandable=False,
Guidance=IF_ARGS [] 20 0}]
main3 = Unique.newUnique ()
main2 :: [Char]
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, Cheap=False, Expandable=False,
Guidance=IF_ARGS [] 40 0}]
main2 =
Unique.$w$cshowsPrec 0 main3 ([] @ Char)
main5 :: Unique.Unique
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, Cheap=False, Expandable=False,
Guidance=IF_ARGS [] 20 0}]
main5 = Unique.newUnique ()
main4 :: [Char]
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, Cheap=False, Expandable=False,
Guidance=IF_ARGS [] 40 0}]
main4 =
Unique.$w$cshowsPrec 0 main5 ([] @ Char)
main1
:: State# RealWorld
-> (# State# RealWorld, () #)
[GblId,
Arity=1,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [0] 110 0}]
main1 =
\ (eta_B1 :: State# RealWorld) ->
case Handle.Text.hPutStr2
Handle.FD.stdout main4 True eta_B1
of _ { (# new_s_atV, _ #) ->
Handle.Text.hPutStr2
Handle.FD.stdout main2 True new_s_atV
}
Note that main3
and main5
are the two separate newUnique ()
calls.
However:
rm *.hi *o Main
ghc -O -fno-cse Main.hs && ./Main
U 0
U 0
Looking at the core for this modified Unique.hs
:
module Unique (newUnique) where
import Data.IORef
import System.IO.Unsafe (unsafePerformIO)
-- Type to represent a unique thing.
-- Show is derived just for testing purposes.
newtype Unique = U Integer
deriving Show
{-# NOINLINE counter #-}
counter :: IORef Integer
counter = unsafePerformIO $ newIORef 0
newUnique' :: IO Unique
newUnique' = do { x <- readIORef counter
; writeIORef counter (x+1)
; return $ U x }
{-# NOINLINE newUnique #-}
newUnique :: () -> Unique
newUnique () = unsafePerformIO newUnique'
it seems that counter
is being inlined as counter_rag
, despite the NOINLINE
pragma (2nd update: wrong! counter_rag
is not marked with [InlPrag=NOINLINE]
, but that doesn't mean it's been inlined; rather, counter_rag
is just the munged name of counter
); the NOINLINE
for newUnique
is respected though:
counter_rag :: IORef Type.Integer
counter_rag =
unsafeDupablePerformIO
@ (IORef Type.Integer)
(lvl1_rvg
`cast` (Sym
(NTCo:IO <IORef Type.Integer>)
:: (State# RealWorld
-> (# State# RealWorld,
IORef Type.Integer #))
~#
IO (IORef Type.Integer)))
[...]
lvl3_rvi
:: State# RealWorld
-> (# State# RealWorld, Unique.Unique #)
[GblId, Arity=1]
lvl3_rvi =
\ (s_aqi :: State# RealWorld) ->
case noDuplicate# s_aqi of s'_aqj { __DEFAULT ->
case counter_rag
`cast` (NTCo:IORef <Type.Integer>
:: IORef Type.Integer
~#
STRef RealWorld Type.Integer)
of _ { STRef var#_au4 ->
case readMutVar#
@ RealWorld @ Type.Integer var#_au4 s'_aqj
of _ { (# new_s_atV, a_atW #) ->
case writeMutVar#
@ RealWorld
@ Type.Integer
var#_au4
(Type.plusInteger a_atW lvl2_rvh)
new_s_atV
of s2#_auo { __DEFAULT ->
(# s2#_auo,
a_atW
`cast` (Sym (Unique.NTCo:Unique)
:: Type.Integer ~# Unique.Unique) #)
}
}
}
}
lvl4_rvj :: Unique.Unique
lvl4_rvj =
unsafeDupablePerformIO
@ Unique.Unique
(lvl3_rvi
`cast` (Sym (NTCo:IO <Unique.Unique>)
:: (State# RealWorld
-> (# State# RealWorld, Unique.Unique #))
~#
IO Unique.Unique))
Unique.newUnique [InlPrag=NOINLINE] :: () -> Unique.Unique
Unique.newUnique =
\ (ds_dq8 :: ()) -> case ds_dq8 of _ { () -> lvl4_rvj }
What's going on here?
2nd UPDATE
User @errge figured it out.
Looking more carefully that the last core output pasted above, we see
that most of the body of Unique.newUnique
has been floated to the
top level as lvl4_rvj
. However, lvl4_rvj
is a constant
expression, not a function, and so it's only evaluated once,
explaining the repeated U 0
output by main
.
Indeed:
rm *.hi *o Main
ghc -O -fno-cse -fno-full-laziness Main.hs && ./Main
U 0
U 1
I don't understand exactly what the -ffull-laziness
optimization
does -- the
GHC docs
talk about floating let bindings, but the body of lvl4_rvj
does not
appear to have been a let binding -- but we can at least compare the above core with
the core generated with -fno-full-laziness
and see that now the body is not lifted:
Unique.newUnique [InlPrag=NOINLINE] :: () -> Unique.Unique
Unique.newUnique =
\ (ds_drR :: ()) ->
case ds_drR of _ { () ->
unsafeDupablePerformIO
@ Unique.Unique
((\ (s_as1 :: State# RealWorld) ->
case noDuplicate# s_as1 of s'_as2 { __DEFAULT ->
case counter_rfj
`cast` (<NTCo:IORef> <Type.Integer>
:: IORef Type.Integer
~#
STRef RealWorld Type.Integer)
of _ { STRef var#_avI ->
case readMutVar#
@ RealWorld @ Type.Integer var#_avI s'_as2
of _ { (# ipv_avz, ipv1_avA #) ->
case writeMutVar#
@ RealWorld
@ Type.Integer
var#_avI
(Type.plusInteger ipv1_avA (__integer 1))
ipv_avz
of s2#_aw2 { __DEFAULT ->
(# s2#_aw2,
ipv1_avA
`cast` (Sym <(Unique.NTCo:Unique)>
:: Type.Integer ~# Unique.Unique) #)
}
}
}
})
`cast` (Sym <(NTCo:IO <Unique.Unique>)>
:: (State# RealWorld
-> (# State# RealWorld, Unique.Unique #))
~#
IO Unique.Unique))
}
Here counter_rfj
corresponds to counter
again, and we see the
difference is that the body of Unique.newUnique
has not been lifted,
and so the reference updating (readMutVar
, writeMutVar
) code will be
run each time Unique.newUnique
is called.
I've updated the gist to
include the new -fno-full-laziness
core file. The earlier core
files were generated on a different computer, so some minor
differences here are unrelated to -fno-full-laziness
.
See an another example how this fails:
module Main where
import Unique
helper :: Int -> Unique
-- noinline pragma here doesn't matter
helper x = newUnique ()
main = do
print $ helper 3
print $ helper 4
With this code the effect is the same as in ntc2's example: correct with -O0, but incorrect with -O. But in this code there is no "common subexpression to eliminate".
What's actually happening here is that the newUnique ()
expression is "floated out" to the top-level, because it doesn't depend on the function's parameters. In GHC speak this is -ffull-laziness
(on by default with -O
, can be turned off with -O -fno-full-laziness
).
So the code effectively becomes this:
helperworker = newUnique ()
helper x = helperworker
And here helperworker is a thunk that can only be evaluated once.
With the already recommended NOINLINE pragmas if you add -fno-full-laziness
to the command line, then it works as expected.