Set
, similarly to []
has a perfectly defined monadic operations. The problem is that they require that the values satisfy Ord
constraint, and so it's impossible to define return
and >>=
without any constraints. The same problem applies to many other data structures that require some kind of constraints on possible values.
The standard trick (suggested to me in a haskell-cafe post) is to wrap Set
into the continuation monad. ContT
doesn't care if the underlying type functor has any constraints. The constraints become only needed when wrapping/unwrapping Set
s into/from continuations:
import Control.Monad.Cont
import Data.Foldable (foldrM)
import Data.Set
setReturn :: a -> Set a
setReturn = singleton
setBind :: (Ord b) => Set a -> (a -> Set b) -> Set b
setBind set f = foldl' (\s -> union s . f) empty set
type SetM r a = ContT r Set a
fromSet :: (Ord r) => Set a -> SetM r a
fromSet = ContT . setBind
toSet :: SetM r r -> Set r
toSet c = runContT c setReturn
This works as needed. For example, we can simulate a non-deterministic function that either increases its argument by 1 or leaves it intact:
step :: (Ord r) => Int -> SetM r Int
step i = fromSet $ fromList [i, i + 1]
-- repeated application of step:
stepN :: Int -> Int -> Set Int
stepN times start = toSet $ foldrM ($) start (replicate times step)
Indeed, stepN 5 0
yields fromList [0,1,2,3,4,5]
. If we used []
monad instead, we would get
[0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5]
instead.
The problem is efficiency. If we call stepN 20 0
the output takes a few seconds and stepN 30 0
doesn't finish within a reasonable amount of time. It turns out that all Set.union
operations are performed at the end, instead of performing them after each monadic computation. The result is that exponentially many Set
s are constructed and union
ed only at the end, which is unacceptable for most tasks.
Is there any way around it, to make this construction efficient? I tried but without success.
(I even suspect that there could be some kinds of theoretical limits following from Curry-Howard isomorphism and Glivenko's theorem. Glivenko's theorem says that for any propositional tautology φ the formula ¬¬φ can be proved in intuitionistic logic. However, I suspect that the length of the proof (in normal form) can be exponentially long. So, perhaps, there could be cases when wrapping a computation into the continuation monad will make it exponentially longer?)
I don't think your performance problems in this case are due to the use of
Cont
gets similar performance to the
Cont
based implementation but occurs entirely in theSet
"restricted monad"I am not sure if I believe your claim about Glivenko's theorem leading to exponential increase in (normalized) proof size--at least in the Call-By-Need context. That is because we can arbitrarily reuse subproofs (and our logic is second order, we need only a single proof of
forall a. ~~(a \/ ~a)
). Proofs are not trees, they are graphs (sharing).In general, you are likely to see performance costs from
Cont
wrappingSet
but they can usually be avoided viaMonads are one particular way of structuring and sequencing computations. The bind of a monad cannot magically restructure your computation so as to happen in a more efficient way. There are two problems with the way you structure your computation.
When evaluating
stepN 20 0
, the result ofstep 0
will be computed 20 times. This is because each step of the computation produces 0 as one alternative, which is then fed to the next step, which also produces 0 as alternative, and so on...Perhaps a bit of memoization here can help.
A much bigger problem is the effect of
ContT
on the structure of your computation. With a bit of equational reasoning, expanding out the result ofreplicate 20 step
, the definition offoldrM
and simplifying as many times as necessary, we can see thatstepN 20 0
is equivalent to:All parentheses of this expression associate to the left. That's great, because it means that the RHS of each occurrence of
(>>=)
is an elementary computation, namelystep
, rather than a composed one. However, zooming in on the definition of(>>=)
forContT
,we see that when evaluating a chain of
(>>=)
associating to the left, each bind will push a new computation onto the current continuationc
. To illustrate what is going on, we can use again a bit of equational reasoning, expanding out this definition for(>>=)
and the definition forrunContT
, and simplifying, yielding:Now, for each occurrence of
setBind
, let's ask ourselves what the RHS argument is. For the leftmost occurrence, the RHS argument is the whole rest of the computation aftersetReturn 0
. For the second occurrence, it's everything afterstep x1
, etc. Let's zoom in to the definition ofsetBind
:Here
f
represents all the rest of the computation, everything on the right hand side of an occurrence ofsetBind
. That means that at each step, we are capturing the rest of the computation asf
, and applyingf
as many times as there are elements inset
. The computations are not elementary as before, but rather composed, and these computations will be duplicated many times.The crux of the problem is that the
ContT
monad transformer is transforming the initial structure of the computation, which you meant as a left associative chain ofsetBind
's, into a computation with a different structure, ie a right associative chain. This is after all perfectly fine, because one of the monad laws says that, for everym
,f
andg
we haveHowever, the monad laws do not impose that the complexity remain the same on each side of the equations of each law. And indeed, in this case, the left associative way of structuring this computation is a lot more efficient. The left associative chain of
setBind
's evaluates in no time, because only elementary subcomputations are duplicated.It turns out that other solutions shoehorning
Set
into a monad also suffer from the same problem. In particular, the set-monad package, yields similar runtimes. The reason being, that it too, rewrites left associative expressions into right associative ones.I think you have put the finger on a very important yet rather subtle problem with insisting that
Set
obeys aMonad
interface. And I don't think it can be solved. The problem is that the type of the bind of a monad needs to beie no class constraint allowed on either
a
orb
. That means that we cannot nest binds on the left, without first invoking the monad laws to rewrite into a right associative chain. Here's why: given(m >>= f) >>= g
, the type of the computation(m >>= f)
is of the formm b
. A value of the computation(m >>= f)
is of typeb
. But because we can't hang any class constraint onto the type variableb
, we can't know that the value we got satisfies anOrd
constraint, and therefore cannot use this value as the element of a set on which we want to be able to computeunion
's.I found out another possibility, based on GHC's ConstraintKinds extension. The idea is to redefine
Monad
so that it includes a parametric constraint on allowed values:(The problem with this approach is in the case the monadic values are functions, such as
m (a -> b)
, because they can't satisfy constraints likeOrd (a -> b)
. So one can't use combinators like<*>
(orap
) for this constrainedSet
monad.)Recently on Haskell Cafe Oleg gave an example how to implement the
Set
monad efficiently. Quoting: