I'm writing a program that reads from a list of files. The each file either contains a link to the next file or marks that it's the end of the chain.
Being new to Haskell, it seemed like the idiomatic way to handle this is is a lazy list of possible files to this end, I have
getFirstFile :: String -> DataFile
getNextFile :: Maybe DataFile -> Maybe DataFile
loadFiles :: String -> [Maybe DataFile]
loadFiles = iterate getNextFile . Just . getFirstFile
getFiles :: String -> [DataFile]
getFiles = map fromJust . takeWhile isJust . loadFiles
So far, so good. The only problem is that, since getFirstFile and getNextFile both need to open files, I need their results to be in the IO monad. This gives the modified form of
getFirstFile :: String -> IO DataFile
getNextFile :: Maybe DataFile -> IO (Maybe DataFile)
loadFiles :: String -> [IO Maybe DataFile]
loadFiles = iterate (getNextFile =<<) . Just . getFirstFile
getFiles :: String -> IO [DataFile]
getFiles = liftM (map fromJust . takeWhile isJust) . sequence . loadFiles
The problem with this is that, since iterate returns an infinite list, sequence becomes an infinite loop. I'm not sure how to proceed from here. Is there a lazier form of sequence that won't hit all of the list elements? Should I be rejiggering the map and takeWhile to be operating inside the IO monad for each list element? Or do I need to drop the whole infinite list process and write a recursive function to terminate the list manually?
Yields:
As you have noticed, IO results can't be lazy, so you can't (easily) build an infinite list using IO. There is a way out, however, in
unsafeInterleaveIO
; with this, you can do something like:It's important to be careful here, though - you've just deferred the results of
ioList
to some unpredictable time in the future. It may never be run at all, in fact. So be very careful when you're being Clever™ like this.Personally, I would just build a manual recursive function.
A step in the right direction
What puzzles me is
getNextFile
. Step into a simplified world with me, where we're not dealing with IO yet. The type isMaybe DataFile -> Maybe DataFile
. In my opinion, this should simply beDataFile -> Maybe DataFile
, and I will operate under the assumption that this adjustment is possible. And that looks like a good candidate forunfoldr
. The first thing I am going to do is make my own simplified version of unfoldr, which is less general but simpler to use.Now the type
f :: a -> Maybe a
matchesgetNextFile :: DataFile -> Maybe DataFile
Beautiful, right?
unfoldr
is a lot likeiterate
, except once it hitsNothing
, it terminates the list.Now, we have a problem.
IO
. How can we do the same thing withIO
thrown in there? Don't even think about The Function Which Shall Not Be Named. We need a beefed up unfoldr to handle this. Fortunately, the source for unfoldr is available to us.Now what do we need? A healthy dose of
IO
.liftM2 unfoldr
almost gets us the right type, but won't quite cut it this time.An actual solution
It is a rather straightforward transformation; I wonder if there is some combinator that could accomplish the same.
Fun fact: we can now define
unfoldr f b = runIdentity $ unfoldrM (return . f) b
Let's again define a simplified
myUnfoldrM
, we just have to sprinkle in aliftM
in there:And now we're all set, just like before.
By the way, I typechecked all of these with
data DataFile = NoClueWhatGoesHere
, and the type signatures forgetFirstFile
andgetNextFile
, with their definitions set toundefined
.[edit] changed
myUnfoldr
andmyUnfoldrM
to behave more likeiterate
, including the initial value in the list of results.[edit] Additional insight on unfolds:
If you have a hard time wrapping your head around unfolds, the Collatz sequence is possibly one of the simplest examples.
Remember,
myUnfoldr
is a simplified unfold for the cases where the "next seed" and the "current output value" are the same, as is the case for collatz. This behavior should be easy to see givenmyUnfoldr
's simple definition in terms ofunfoldr
andtuplefy x = (x,x)
.More, mostly unrelated thoughts
The rest has absolutely nothing to do with the question, but I just couldn't resist musing. We can define
myUnfoldr
in terms ofmyUnfoldrM
:Look familiar? We can even abstract this pattern:
sinkM
should work to "sink" (opposite of "lift") any function of the formMonad m => (a -> m b) -> a -> m c
.since the
Monad m
in those functions can be unified with theIdentity
monad constraint ofsinkM
. However, I don't see anything thatsinkM
would actually be useful for.Laziness and I/O are a tricky combination. Using
unsafeInterleaveIO
is one way to produce lazy lists in the IO monad (and this is the technique used by the standardgetContents
,readFile
and friends). However, as convenient as this is, it exposes pure code to possible I/O errors and makes makes releasing resources (such as file handles) non-deterministic. This is why most "serious" Haskell applications (especially those concerned with efficiency) nowadays use things called Enumerators and Iteratees for streaming I/O. One library in Hackage that implements this concept isenumerator
.You are probably fine with using lazy I/O in your application, but I thought I'd still give this as an example of another way to approach these kind of problems. You can find more in-depth tutorials about iteratees here and here.
For example, your stream of DataFiles could be implemented as an Enumerator like this: