Assume we have an IO action such as
lookupStuff :: InputType -> IO OutputType
which could be something simple such as DNS lookup, or some web-service call against a time-invariant data.
Let's assume that:
The operation never throws any exception and/or never diverges
If it wasn't for the
IO
monad, the function would be pure, i.e. the result is always the same for equal input parametersThe action is reentrant, i.e. it can be called from multiple threads at the same time safely.
The
lookupStuff
operation is quite (time-)expensive.
The problem I'm facing is how to properly (and w/o using any unsafe*IO*
cheat) implement a reentrant cache, that can be called from multiple threads, and coalesces multiple queries for the same input-parameters into a single request.
I guess I'm after something similiar as GHC's blackhole-concept for pure computations but in the IO "calculation" context.
What is the idiomatic Haskell/GHC solution for the stated problem?
Yeah, basically reimplement the logic. Although it seems similar to what GHC is already doing, that's GHC's choice. Haskell can be implemented on VMs that work very differently, so in that sense it isn't already done for you.
But yeah, just use an
MVar (Map InputType OutputType)
or even anIORef (Map InputType OutputType)
(make sure to modify withatomicModifyIORef
), and just store the cache in there. If this imperative solution seems wrong, it's the "if not for theIO
, this function would be pure" constraint. If it were just an arbitraryIO
action, then the idea that you would have to keep state in order to know what to execute or not seems perfectly natural. The problem is that Haskell does not have a type for "pure IO" (which, if it depends on a database, it is just behaving pure under certain conditions, which is not the same as being a hereditarily pure).Yeah it's ugly on the inside. But on the outside, look at that! It's just like the type of a pure memoization function, except for it has
IO
stained all over it.Here's some code implementing more or less what I was after in my original question:
The code above builds upon Simon Marlow's
Async
abstraction as described in Tutorial: Parallel and Concurrent Programming in Haskell: