How do laziness and I/O work together in Haskell?

2020-08-15 02:06发布

问题:

I'm trying to get a deeper understanding of laziness in Haskell.

I was imagining the following snippet today:

data Image = Image { name :: String, pixels :: String }

image :: String -> IO Image
image path = Image path <$> readFile path

The appeal here is that I could simply create an Image instance and pass it around; if I need the image data it would be read lazily - if not, the time and memory cost of reading the file would be avoided:

 main = do
   image <- image "file"
   putStrLn $ length $ pixels image

But is that how it actually works? How is laziness compatible with IO? Will readFile be called regardless of whether I access pixels image or will the runtime leave that thunk unevaluated if I never refer to it?

If the image is indeed read lazily, then isn't it possible I/O actions could occur out of order? For example, what if immediately after calling image I delete the file? Now the putStrLn call will find nothing when it tries to read.

回答1:

How is laziness compatible with I/O?

Short answer: It isn't.


Long answer: IO actions are strictly sequenced, for pretty much the reasons you're thinking of. Any pure computations done with the results can be lazy, of course; for instance if you read in a file, do some processing, and then print out some of the results, it's likely that any processing not needed by the output won't be evaluated. However, the entire file will be read, even parts you never use. If you want lazy I/O, you have roughly two options:

  • Roll your own explicit lazy-loading routines and such, like you would in any strict language. Seems annoying, granted, but on the other hand Haskell makes a fine strict, imperative language. If you want to try something new and interesting, try looking at Iteratees.

  • Cheat like a cheating cheater. Functions such as hGetContents will do lazy, on-demand I/O for you, no questions asked. What's the catch? It (technically) breaks referential transparency. Pure code can indirectly cause side effects, and funny things can happen involving ordering of side effects if your code is really convoluted. hGetContents and friends are implemented using unsafeInterleaveIO, which is... exactly what it says on the tin. It's nowhere near as likely to blow up in your face as using unsafePerformIO, but consider yourself warned.



回答2:

Lazy I/O breaks Haskell's purity. The results from readFile are indeed produced lazily, on demand. The order in which I/O actions occur is not fixed, so yes, they could occur "out of order". The problem of deleting the file before pulling the pixels is real. In short, lazy I/O is a great convenience, but it's a tool with very sharp edges.

The book on Real World Haskell has a lengthy treatment of lazy I/O and goes over some of the pitfalls.