How to enumerate a recursive datatype in Haskell?

2019-01-09 02:22发布

问题:

This blog post has an interesting explanation of how to use the Omega monad to enumerate an arbitrary grammar diagonally. He offers an example of how to do so, resulting in an infinite sequence of strings. I'd like to do the same, except that, instead of generating a list of strings, it generates a list of an actual datatype. For example,

 data T = A | B T | C T T

Would generate

A, B A, C A A, C (B A) A... 

Or something similar. Unfortunately my Haskell skills are still maturing and after some hours playing it I couldn't manage to do what I want. How can that be done?

As requested, one of my attempts (I have tried too many things...):

import Control.Monad.Omega

data T = A | B T | C T T deriving (Show)

a = [A] 
        ++ (do { x <- each a; return (B x) })
        ++ (do { x <- each a; y <- each a; return (C x y) })

main = print $ take 10 $ a

回答1:

My first ugly approach was:

allTerms :: Omega T
allTerms = do
  which <- each [ 1,2,3 ]
  if which == 1 then
    return A
  else if which == 2 then do
    x <- allTerms
    return $ B x
  else do
    x <- allTerms
    y <- allTerms
    return $ C x y

But then, after some cleaning up I reached this one liner

import Control.Applicative
import Control.Monad.Omega
import Control.Monad

allTerms :: Omega T
allTerms = join $ each [return A, B <$> allTerms, C <$> allTerms <*> allTerms]

Note that order matters: return A has to be the first choice in the list above, or allTerms will not terminate. Basically, the Omega monad ensures a "fair scheduling" among choices, saving you from e.g. infiniteList ++ something, but does not prevent infinite recursion.


An even more elegant solution was suggested by Crazy FIZRUK, exploiting the Alternative instance of Omega.

import Control.Applicative
import Data.Foldable (asum)
import Control.Monad.Omega

allTerms :: Omega T
allTerms = asum [ pure A
                , B <$> allTerms
                , C <$> allTerms <*> allTerms
                ]


回答2:

I finally found the time to write a generic version. It uses the Universe typeclass, which represents recursively enumerabley types. Here it is:

{-# LANGUAGE DeriveGeneric, TypeOperators, ScopedTypeVariables #-}
{-# LANGUAGE FlexibleInstances, FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances, OverlappingInstances #-}

import Data.Universe
import Control.Monad.Omega
import GHC.Generics
import Control.Monad (mplus, liftM2)

class GUniverse f where
    guniverse :: [f a]

instance GUniverse U1 where
    guniverse = [U1]

instance (Universe c) => GUniverse (K1 i c) where
    guniverse = fmap K1 (universe :: [c])

instance (GUniverse f) => GUniverse (M1 i c f) where
    guniverse = fmap M1 (guniverse :: [f p])

instance (GUniverse f, GUniverse g) => GUniverse (f :*: g) where
    guniverse = runOmega $ liftM2 (:*:) ls rs
        where ls = each (guniverse :: [f p])
              rs = each (guniverse :: [g p])

instance (GUniverse f, GUniverse g) => GUniverse (f :+: g) where
    guniverse = runOmega $ (fmap L1 $ ls) `mplus` (fmap R1 $ rs)
        where ls = each (guniverse :: [f p])
              rs = each (guniverse :: [g p])

instance (Generic a, GUniverse (Rep a)) => Universe a where
    universe = fmap to $ (guniverse :: [Rep a x])


data T = A | B T | C T T deriving (Show, Generic)
data Tree a = Leaf a | Branch (Tree a) (Tree a) deriving (Show, Generic)

I couldn't find a way to remove UndecidableInstances, but that should be of no greater concern. OverlappingInstances is only required to override predefined Universe instances, like Either's. Now some nice outputs:

*Main> take 10 $ (universe :: [T])
[A,B A,B (B A),C A A,B (B (B A)),C A (B A),B (C A A),C (B A) A,B (B (B (B A))),C A (B (B A))]
*Main> take 20 $ (universe :: [Either Int Char])
[Left (-9223372036854775808),Right '\NUL',Left (-9223372036854775807),Right '\SOH',Left (-9223372036854775806),Right '\STX',Left (-9223372036854775805),Right '\ETX',Left (-9223372036854775804),Right '\EOT',Left (-9223372036854775803),Right '\ENQ',Left (-9223372036854775802),Right '\ACK',Left (-9223372036854775801),Right '\a',Left (-9223372036854775800),Right '\b',Left (-9223372036854775799),Right '\t']
*Main> take 10 $ (universe :: [Tree Bool])
[Leaf False,Leaf True,Branch (Leaf False) (Leaf False),Branch (Leaf False) (Leaf True),Branch (Leaf True) (Leaf False),Branch (Leaf False) (Branch (Leaf False) (Leaf False)),Branch (Leaf True) (Leaf True),Branch (Branch (Leaf False) (Leaf False)) (Leaf False),Branch (Leaf False) (Branch (Leaf False) (Leaf True)),Branch (Leaf True) (Branch (Leaf False) (Leaf False))]

I'm not exactly sure what happens in the branching order of mplus, but I think it should all work out if Omega is correctly implemented, which I strongly believe.


But wait! The above implementation is not yet bug-free; it diverges on "left recursive" types, like this:

data T3 = T3 T3 | T3' deriving (Show, Generic)

while this works:

data T6 = T6' | T6 T6 deriving (Show, Generic)

I'll see if I can fix that. EDIT: At some time, the solution of this problem might be found in this question.



回答3:

You really should show us what you have tried so far. But granted, this is not an easy problem for a bgeinner.

Let's try to write a naive version down:

enum = A : (map B enum ++ [ C x y | x <- enum, y <- enum ])

Ok, this actually gives us:

[A, B A, B (B A), B (B (B A)), .... ]

and never reaches the C values.

We obviously need to construct the list in steps. Say we already have a complete list of items up to a certain nesting level, we can compute the items with one nesting level more in one step:

step xs = map B xs ++ [ C x y | x <- xs, y <- xs ]

For example, we get:

> step [A]
[B A,C A A]
> step (step [A])
[B (B A),B (C A A),C (B A) (B A),C (B A) (C A A),C (C A A) (B A),C (C A A) (C A ...

What we want is thus:

[A] ++ step [A] ++ step (step [A]) ++ .....

which is the concatenation of the result of

iterate step [A]

which is, of course

someT = concat (iterate step [A])

Warning: You will notice that this still does not give all values. For example:

C A (B (B A))

will be missing.

Can you find out why? Can you improve it?



回答4:

Below is a terrible solution, but perhaps an interesting one.


We might consider the idea of adding "one more layer"

grow :: T -> Omega T
grow t = each [A, B t, C t t]

which is close to correct but has a flaw—in particular, in the C branch, we end up having both of the arguments take the exact same values instead of being able to vary independently. We can fix this by computing the "base functor" of T which looks like this

data T    = A  | B  T | C  T T
data Tf x = Af | Bf x | Cf x x deriving Functor

In particular, Tf is just a copy of T where the recursive calls are functor "holes" instead of direct recursive calls. Now we can write:

grow :: Omega T -> Omega (Tf (Omega T))
grow ot = each [ Af, Bf ot, Cf ot ot ]

which has a whole computation of a new set of T in each hole. If we could somehow "flatten" the Omega (Tf (Omega T)) into Omega T then we'd have a computation which adds "one new layer" to our Omega computation correctly.

flatten :: Omega (Tf (Omega T)) -> Omega T
flatten = ...

and we could take the fixed point of this layering with fix

fix :: (a -> a) -> a

every :: Omega T
every = fix (flatten . grow)

So the only trick is to figure out flatten. To do this we need to notice two features of Tf. First, it's Traversable so we can use sequenceA to "flip" the order of Tf and Omega

flatten = ?f . fmap (?g . sequenceA)

where ?f :: Omega (Omega T) -> Omega T is just join. The final tricky bit is figuring out ?g :: Omega (Tf T) -> Omega T. Obviously, we don't care about the Omega layer so we should just fmap a function of type Tf T -> T.

And this function is very close to the defining notion for the relationship between Tf and T: we can always compress a layer of Tf on the top of T.

compress :: Tf T -> T
compress Af         = A
compress (Bf t)     = B t
compress (Cf t1 t2) = C t1 t2

All together we have

flatten :: Omega (Tf (Omega T)) -> Omega T
flatten = join . fmap (fmap compress . sequenceA)

Ugly, but all together functional.