Am I abusing unsafePerformIO?

2019-03-10 22:54发布

To get acquainted with unsafePerformIO (how to use it and when to use it), I've implemented a module for generating unique values.

Here's what I have:

module Unique (newUnique) where

import Data.IORef
import System.IO.Unsafe (unsafePerformIO)

-- Type to represent a unique thing.
-- Show is derived just for testing purposes.
newtype Unique = U Integer
  deriving Show

-- I believe this is the Haskell'98 derived instance, but
-- I want to be explicit, since its Eq instance is the most
-- important part of Unique.
instance Eq Unique where
  (U x) == (U y) = x == y

counter :: IORef Integer
counter = unsafePerformIO $ newIORef 0

updateCounter :: IO ()
updateCounter = do
  x <- readIORef counter
  writeIORef counter (x+1)

readCounter :: IO Integer
readCounter = readIORef counter

newUnique' :: IO Unique
newUnique' = do { x <- readIORef counter
                ; writeIORef counter (x+1)
                ; return $ U x }

newUnique :: () -> Unique
newUnique () = unsafePerformIO newUnique'

To my delight, the package called Data.Unique chose the same datatype as I did; on the other hand, they chose the type newUnique :: IO Unique, but I want to stay out of IO if possible.

Is this implementation dangerous? Could it possibly lead GHC to change the semantics of a program which uses it?

4条回答
forever°为你锁心
2楼-- · 2019-03-10 22:58

The purpose of unsafePerformIO is when your function does some action internally, but has no side effects that an observer would notice. For example, a function that take a vector, copies it, quicksorts the copy in-place, then returns the copy. (see comments) Each of these operations has side effects, and so is in IO, but the overall result does not.

newUnique must be an IO action because it generates something different every time. This is basically the definition of IO, it means a verb, as opposed to functions which are adjectives. A function will always return the same result for the same arguments. This is called referential transparency.

For valid uses of unsafePerformIO, see this question.

查看更多
时光不老,我们不散
3楼-- · 2019-03-10 23:12

See an another example how this fails:

module Main where
import Unique

helper :: Int -> Unique
-- noinline pragma here doesn't matter
helper x = newUnique ()

main = do
  print $ helper 3
  print $ helper 4

With this code the effect is the same as in ntc2's example: correct with -O0, but incorrect with -O. But in this code there is no "common subexpression to eliminate".

What's actually happening here is that the newUnique () expression is "floated out" to the top-level, because it doesn't depend on the function's parameters. In GHC speak this is -ffull-laziness (on by default with -O, can be turned off with -O -fno-full-laziness).

So the code effectively becomes this:

helperworker = newUnique ()
helper x = helperworker

And here helperworker is a thunk that can only be evaluated once.

With the already recommended NOINLINE pragmas if you add -fno-full-laziness to the command line, then it works as expected.

查看更多
唯我独甜
4楼-- · 2019-03-10 23:20

Treat unsafePerformIO as a promise to the compiler. It says "I promise that you can treat this IO action as if it were a pure value and nothing will go wrong". It's useful because there are times you can build a pure interface to a computation implemented with impure operations, but it's impossible for the compiler to verify when this is the case; instead unsafePerformIO allows you to put your hand on your heart and swear that you have verified that the impure computation is actually pure, so the compiler can simply trust that it is.

In this case that promise is false. If newUnique were a pure function then let x = newUnique () in (x, x) and (newUnique (), newUnique ()) would be equivalent expressions. But you would want these two expressions to have different results; a pair of duplicates of the same Unique value in one case, and a pair of two different Unique values in the other. With your code, there's really no way to say what either expression means. They can only be understood by considering the actual sequence of operations the program will carry out at runtime, and control over that is exactly what you're relinquishing when you use unsafePerformIO. unsafePerformIO says it doesn't matter whether either expression is compiled as one execution of newUnique or two, and any implementation of Haskell is free to choose whatever it likes each and every time it encounters such code.

查看更多
太酷不给撩
5楼-- · 2019-03-10 23:21

Yes, your module is dangerous. Consider this example:

module Main where
import Unique

main = do
  print $ newUnique ()
  print $ newUnique ()

Compile and run:

$ ghc Main.hs
$ ./Main
U 0
U 1

Compile with optimization and run:

$ \rm *.{hi,o}
$ ghc -O Main.hs
$ ./Main
U 0
U 0

Uh-oh!

Adding {-# NOINLINE counter #-} and {-# NOINLINE newUnique #-} does not help, so I'm not actually sure what's happening here ...

1st UPDATE

Looking at the GHC core, I see that @LambdaFairy was correct that constant subexpression elimination (CSE) caused my newUnique () expressions to be lifted. However, preventing CSE with -fno-cse and adding {-# NOINLINE counter #-} to Unique.hs is not sufficient to make the optimized program print the same as the unoptimized program! In particular, it seems that counter is inlined even with the NOINLINE pragma in Unique.hs. Does anyone understand why?

I've uploaded the full versions of the following core files at https://gist.github.com/ntc2/6986500.

The (relevant) core for main when compiling with -O:

main3 :: Unique.Unique
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 20 0}]
main3 = Unique.newUnique ()

main2 :: [Char]
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 40 0}]
main2 =
  Unique.$w$cshowsPrec 0 main3 ([] @ Char)

main4 :: [Char]
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 40 0}]
main4 =
  Unique.$w$cshowsPrec 0 main3 ([] @ Char)

main1
  :: State# RealWorld
     -> (# State# RealWorld, () #)
[GblId,
 Arity=1,

 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [0] 110 0}]
main1 =
  \ (eta_B1 :: State# RealWorld) ->
    case Handle.Text.hPutStr2
           Handle.FD.stdout main4 True eta_B1
    of _ { (# new_s_atQ, _ #) ->
    Handle.Text.hPutStr2
      Handle.FD.stdout main2 True new_s_atQ
    }

Note that the newUnique () calls have been lifted and bound to main3.

And now when compiling with -O -fno-cse:

main3 :: Unique.Unique
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 20 0}]
main3 = Unique.newUnique ()

main2 :: [Char]
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 40 0}]
main2 =
  Unique.$w$cshowsPrec 0 main3 ([] @ Char)

main5 :: Unique.Unique
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 20 0}]
main5 = Unique.newUnique ()

main4 :: [Char]
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 40 0}]
main4 =
  Unique.$w$cshowsPrec 0 main5 ([] @ Char)

main1
  :: State# RealWorld
     -> (# State# RealWorld, () #)
[GblId,
 Arity=1,

 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [0] 110 0}]
main1 =
  \ (eta_B1 :: State# RealWorld) ->
    case Handle.Text.hPutStr2
           Handle.FD.stdout main4 True eta_B1
    of _ { (# new_s_atV, _ #) ->
    Handle.Text.hPutStr2
      Handle.FD.stdout main2 True new_s_atV
    }

Note that main3 and main5 are the two separate newUnique () calls.

However:

rm *.hi *o Main
ghc -O -fno-cse Main.hs && ./Main
U 0
U 0

Looking at the core for this modified Unique.hs:

module Unique (newUnique) where

import Data.IORef
import System.IO.Unsafe (unsafePerformIO)

-- Type to represent a unique thing.
-- Show is derived just for testing purposes.
newtype Unique = U Integer
  deriving Show

{-# NOINLINE counter #-}
counter :: IORef Integer
counter = unsafePerformIO $ newIORef 0

newUnique' :: IO Unique
newUnique' = do { x <- readIORef counter
                ; writeIORef counter (x+1)
                ; return $ U x }

{-# NOINLINE newUnique #-}
newUnique :: () -> Unique
newUnique () = unsafePerformIO newUnique'

it seems that counter is being inlined as counter_rag, despite the NOINLINE pragma (2nd update: wrong! counter_rag is not marked with [InlPrag=NOINLINE], but that doesn't mean it's been inlined; rather, counter_rag is just the munged name of counter); the NOINLINE for newUnique is respected though:

counter_rag :: IORef Type.Integer

counter_rag =
  unsafeDupablePerformIO
    @ (IORef Type.Integer)
    (lvl1_rvg
     `cast` (Sym
               (NTCo:IO <IORef Type.Integer>)
             :: (State# RealWorld
                 -> (# State# RealWorld,
                       IORef Type.Integer #))
                  ~#
                IO (IORef Type.Integer)))

[...]

lvl3_rvi
  :: State# RealWorld
     -> (# State# RealWorld, Unique.Unique #)
[GblId, Arity=1]
lvl3_rvi =
  \ (s_aqi :: State# RealWorld) ->
    case noDuplicate# s_aqi of s'_aqj { __DEFAULT ->
    case counter_rag
         `cast` (NTCo:IORef <Type.Integer>
                 :: IORef Type.Integer
                      ~#
                    STRef RealWorld Type.Integer)
    of _ { STRef var#_au4 ->
    case readMutVar#
           @ RealWorld @ Type.Integer var#_au4 s'_aqj
    of _ { (# new_s_atV, a_atW #) ->
    case writeMutVar#
           @ RealWorld
           @ Type.Integer
           var#_au4
           (Type.plusInteger a_atW lvl2_rvh)
           new_s_atV
    of s2#_auo { __DEFAULT ->
    (# s2#_auo,
       a_atW
       `cast` (Sym (Unique.NTCo:Unique)
               :: Type.Integer ~# Unique.Unique) #)
    }
    }
    }
    }

lvl4_rvj :: Unique.Unique

lvl4_rvj =
  unsafeDupablePerformIO
    @ Unique.Unique
    (lvl3_rvi
     `cast` (Sym (NTCo:IO <Unique.Unique>)
             :: (State# RealWorld
                 -> (# State# RealWorld, Unique.Unique #))
                  ~#
                IO Unique.Unique))

Unique.newUnique [InlPrag=NOINLINE] :: () -> Unique.Unique

Unique.newUnique =
  \ (ds_dq8 :: ()) -> case ds_dq8 of _ { () -> lvl4_rvj }

What's going on here?

2nd UPDATE

User @errge figured it out. Looking more carefully that the last core output pasted above, we see that most of the body of Unique.newUnique has been floated to the top level as lvl4_rvj. However, lvl4_rvj is a constant expression, not a function, and so it's only evaluated once, explaining the repeated U 0 output by main.

Indeed:

rm *.hi *o Main
ghc -O -fno-cse -fno-full-laziness Main.hs && ./Main
U 0
U 1

I don't understand exactly what the -ffull-laziness optimization does -- the GHC docs talk about floating let bindings, but the body of lvl4_rvj does not appear to have been a let binding -- but we can at least compare the above core with the core generated with -fno-full-laziness and see that now the body is not lifted:

Unique.newUnique [InlPrag=NOINLINE] :: () -> Unique.Unique

Unique.newUnique =
  \ (ds_drR :: ()) ->
    case ds_drR of _ { () ->
    unsafeDupablePerformIO
      @ Unique.Unique
      ((\ (s_as1 :: State# RealWorld) ->
          case noDuplicate# s_as1 of s'_as2 { __DEFAULT ->
          case counter_rfj
               `cast` (<NTCo:IORef> <Type.Integer>
                       :: IORef Type.Integer
                            ~#
                          STRef RealWorld Type.Integer)
          of _ { STRef var#_avI ->
          case readMutVar#
                 @ RealWorld @ Type.Integer var#_avI s'_as2
          of _ { (# ipv_avz, ipv1_avA #) ->
          case writeMutVar#
                 @ RealWorld
                 @ Type.Integer
                 var#_avI
                 (Type.plusInteger ipv1_avA (__integer 1))
                 ipv_avz
          of s2#_aw2 { __DEFAULT ->
          (# s2#_aw2,
             ipv1_avA
             `cast` (Sym <(Unique.NTCo:Unique)>
                     :: Type.Integer ~# Unique.Unique) #)
          }
          }
          }
          })
       `cast` (Sym <(NTCo:IO <Unique.Unique>)>
               :: (State# RealWorld
                   -> (# State# RealWorld, Unique.Unique #))
                    ~#
                  IO Unique.Unique))
    }

Here counter_rfj corresponds to counter again, and we see the difference is that the body of Unique.newUnique has not been lifted, and so the reference updating (readMutVar, writeMutVar) code will be run each time Unique.newUnique is called.

I've updated the gist to include the new -fno-full-laziness core file. The earlier core files were generated on a different computer, so some minor differences here are unrelated to -fno-full-laziness.

查看更多
登录 后发表回答