I have an ADT representing the AST for a simple language:
data UTerm = UTrue
| UFalse
| UIf UTerm UTerm UTerm
| UZero
| USucc UTerm
| UIsZero UTerm
This data structure can represent invalid terms that don't follow the type
rules of the language, like UIsZero UFalse
, so I'd like to use a GADT that
enforces well-typedness:
{-# LANGUAGE GADTs #-}
data TTerm a where
TTrue :: TTerm Bool
TFalse :: TTerm Bool
TIf :: TTerm Bool -> TTerm a -> TTerm a -> TTerm a
TZero :: TTerm Int
TSucc :: TTerm Int -> TTerm Int
TIsZero :: TTerm Int -> TTerm Bool
My problem is to type check a UTerm and convert it into a TTerm. My first
thought was UTerm -> Maybe (TTerm a)
, but this of course doesn't work because
it's not valid for all a
s. I don't even know what the type would be, because
we don't know if a
is going to be Int or Bool. Then I thought I could write a
different type checking function for each of the possible values of a
:
import Control.Applicative
typecheckbool :: UTerm -> Maybe (TTerm Bool)
typecheckbool UTrue = Just TTrue
typecheckbool UFalse = Just TFalse
typecheckbool (UIsZero a) = TIsZero <$> typecheckint a
typecheckbool _ = Nothing
typecheckint :: UTerm -> Maybe (TTerm Int)
typecheckint UZero = Just TZero
typecheckint (USucc a) = TSucc <$> typecheckint a
typecheckint (UIf a b c) = TIf <$> typecheckbool a <*> typecheckint b <*> typecheckint c
typecheckint UTrue = Nothing
typecheckint UFalse = Nothing
typecheckint (UIsZero _) = Nothing
This works for some cases, for a subset of the language where TIf requires its
consequent and alternative are Ints (But TIf TTrue TFalse TTrue
is actually
totally valid), and where we know the target type of the expression we're
typing.
What's the right way to convert from a UTerm to a TTerm?
The standard technique is to define an existential type:
data ETerm_ where
ETerm_ :: TTerm a -> ETerm
In this case, you may also want some term-level evidence of which type you have; e.g.
data Type a where
TInt :: Type Int
TBool :: Type Bool
then the real ETerm
would look like this:
data ETerm where
ETerm :: Type a -> TTerm a -> ETerm
The interesting case of type checking is then something like
typeCheck (UIf ucond ut uf) = do
ETerm TBool tcond <- typeCheck ucond
ETerm tyt tt <- typeCheck ut
ETerm tyf tf <- typeCheck uf
case (tyt, tyf) of
(TBool, TBool) -> return (ETerm TBool (TIf tcond tt tf))
(TInt , TInt ) -> return (ETerm TInt (TIf tcond tt tf))
_ -> fail "branches have different types"
As a minor complement of @DanielWagner's answer, you might want to factorize type equality checks such as
...
case (tyt, tyf) of
(TBool, TBool) -> return (ETerm TBool (TIf tcond tt tf))
(TInt , TInt ) -> return (ETerm TInt (TIf tcond tt tf))
_ -> fail "branches have different types"
One way to do that is using equality witnesses:
import Data.Type.Equality
typeEq :: Type a -> Type b -> Maybe (a :~: b)
typeEq TInt TInt = Just Refl
typeEq TBool TBool = Just Refl
typeEq _ _ = Nothing
typeCheck :: UTerm -> Maybe ETerm
typeCheck (UIf ucond ut uf) = do
ETerm TBool tcond <- typeCheck ucond
ETerm tyt tt <- typeCheck ut
ETerm tyf tf <- typeCheck uf
case typeEq tyt tyf of
Just Refl -> return (ETerm tyt (TIf tcond tt tf))
_ -> fail "branches have different types"
This factorization is convenient if you need to check type equality in multiple parts of your type checking routine. It also allows for extending the language with pair types like (t1,t2)
, which require a structural recursive approach to check type equality.
One might even write a full decider for type equality
{-# LANGUAGE EmptyCase #-}
typeEq2 :: Type a -> Type b -> Either (a :~: b) ((a :~:b) -> Void)
typeEq2 TInt TInt = Left Refl
typeEq2 TInt TBool = Right (\eq -> case eq of)
typeEq2 TBool TBool = Left Refl
typeEq2 TBool TInt = Right (\eq -> case eq of)
but this, I guess, will probably not be needed unless you are trying to model quite advanced types (e.g. GADTs).
The code above uses and empty case to examine all the possible values eq
can have. Since it has type e.g. Int :~: Bool
, and there are no constructors matching with that type, we are left with no possible values for eq
and so no case branch is needed. This will not trigger an exhaustiveness warning, since there are indeed no cases left unhandled (OT: I wish these warnings to be actual errors).
Instead of using EmptyCase
you can also use something like case eq of _ -> undefined
, but using bottoms in proof terms like the above one is fishy.