Don Stewart's Haskell in the Large's presentation mentioned Phantom Types:
data Ratio n = Ratio Double
1.234 :: Ratio D3
data Ask ccy = Ask Double
Ask 1.5123 :: Ask GBP
I read over his bullet points about them, but I did not understand them. In addition, I read the Haskell Wiki on the topic. Yet I still am missing their point.
What's the motivation to use a phantom type?
To answer the "what's the motivation to use a phantom type". There is two points:
- to make invalid states unrepresentable, which is well explained in Aadit's answer
- Carry some of the information on the type level
For example you could have distances tagged by the length unit:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
newtype Distance a = Distance Double
deriving (Num, Show)
data Kilometer
data Mile
marathonDistance :: Distance Kilometer
marathonDistance = Distance 42.195
distanceKmToMiles :: Distance Kilometer -> Distance Mile
distanceKmToMiles (Distance km) = Distance (0.621371 * km)
marathonDistanceInMiles :: Distance Mile
marathonDistanceInMiles = distanceKmToMiles marathonDistance
And you can avoid Mars Climate Orbiter disaster:
>>> marathonDistanceInMiles
Distance 26.218749345
>>> marathonDistanceInMiles + marathonDistance
<interactive>:10:27:
Couldn't match type ‘Kilometer’ with ‘Mile’
Expected type: Distance Mile
Actual type: Distance Kilometer
In the second argument of ‘(+)’, namely ‘marathonDistance’
In the expression: marathonDistanceInMiles + marathonDistance
There are slight varitions to this "pattern". You can use DataKinds
to have closed set of units:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE DataKinds #-}
data LengthUnit = Kilometer | Mile
newtype Distance (a :: LengthUnit) = Distance Double
deriving (Num, Show)
marathonDistance :: Distance 'Kilometer
marathonDistance = Distance 42.195
distanceKmToMiles :: Distance 'Kilometer -> Distance 'Mile
distanceKmToMiles (Distance km) = Distance (0.621371 * km)
marathonDistanceInMiles :: Distance 'Mile
marathonDistanceInMiles = distanceKmToMiles marathonDistance
And it will work similarly:
>>> marathonDistanceInMiles
Distance 26.218749345
>>> marathonDistance + marathonDistance
Distance 84.39
>>> marathonDistanceInMiles + marathonDistance
<interactive>:28:27:
Couldn't match type ‘'Kilometer’ with ‘'Mile’
Expected type: Distance 'Mile
Actual type: Distance 'Kilometer
In the second argument of ‘(+)’, namely ‘marathonDistance’
In the expression: marathonDistanceInMiles + marathonDistance
But now the Distance
can be only in kilometers or miles, we can't add more units later. That might be useful in some use cases.
We could also do:
data Distance = Distance { distanceUnit :: LengthUnit, distanceValue :: Double }
deriving (Show)
In the distance case we can work out the addition, for example translate to kilometers if different units are involved. But this doesn't work well for currencies which ratio isn't constant over time etc.
And it's possible to use GADTs for that instead, which may be simpler approach in some situations:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE StandaloneDeriving #-}
data Kilometer
data Mile
data Distance a where
KilometerDistance :: Double -> Distance Kilometer
MileDistance :: Double -> Distance Mile
deriving instance Show (Distance a)
marathonDistance :: Distance Kilometer
marathonDistance = KilometerDistance 42.195
distanceKmToMiles :: Distance Kilometer -> Distance Mile
distanceKmToMiles (KilometerDistance km) = MileDistance (0.621371 * km)
marathonDistanceInMiles :: Distance Mile
marathonDistanceInMiles = distanceKmToMiles marathonDistance
Now we know the unit also on the value level:
>>> marathonDistanceInMiles
MileDistance 26.218749345
This approach especially greately simplifies Expr a
example from Aadit's answer:
{-# LANGUAGE GADTs #-}
data Expr a where
Number :: Int -> Expr Int
Boolean :: Bool -> Expr Bool
Increment :: Expr Int -> Expr Int
Not :: Expr Bool -> Expr Bool
It's worth pointing out that the latter variations require non-trivial language extensions (GADTs
, DataKinds
, KindSignatures
), which might not be supported in your compiler. That's might be the case with Mu compiler Don mentions.
The motivation behind using phantom types is to specialize the return type of data constructors. For example, consider:
data List a = Nil | Cons a (List a)
The return type of both Nil
and Cons
is List a
by default (which is generalized for all lists of type a
).
Nil :: List a
Cons :: a -> List a -> List a
|____|
|
-- return type is generalized
Also note that Nil
is a phantom constructor (i.e. its return type doesn't depend upon its arguments, vacuously in this case, but nonetheless the same).
Because Nil
is a phantom constructor we can specialize Nil
to any type we want (e.g. Nil :: List Int
or Nil :: List Char
).
Normal algebraic data types in Haskell allow you to choose the type of the arguments of a data constructor. For example, we chose the type of arguments for Cons
above (a
and List a
).
However, it doesn't allow you to choose the return type of a data constructor. The return type is always generalized. This is fine for most cases. However, there are exceptions. For example:
data Expr a = Number Int
| Boolean Bool
| Increment (Expr Int)
| Not (Expr Bool)
The type of the data constructors are:
Number :: Int -> Expr a
Boolean :: Bool -> Expr a
Increment :: Expr Int -> Expr a
Not :: Expr Bool -> Expr a
As you can see, the return type of all the data constructors are generalized. This is problematic because we know that Number
and Increment
must always return an Expr Int
and Boolean
and Not
must always return an Expr Bool
.
The return types of the data constructors are wrong because they are too general. For example, Number
cannot possibly return an Expr a
but yet it does. This allows you to write wrong expressions which the type checker won't catch. For example:
Increment (Boolean False) -- you shouldn't be able to increment a boolean
Not (Number 0) -- you shouldn't be able to negate a number
The problem is that we can't specify the return type of data constructors.
Notice that all the data constructors of Expr
are phantom constructors (i.e. their return type doesn't depend upon their arguments). A data type whose constructors are all phantom constructors is called a phantom type.
Remember that the return type of phantom constructors like Nil
can be specialized to any type we want. Hence, we can create smart constructors for Expr
as follows:
number :: Int -> Expr Int
boolean :: Bool -> Expr Bool
increment :: Expr Int -> Expr Int
not :: Expr Bool -> Expr Bool
number = Number
boolean = Boolean
increment = Increment
not = Not
Now we can use the smart constructors instead of the normal constructors and our problem is solved:
increment (boolean False) -- error
not (number 0) -- error
So phantom constructors are useful when you want to specialize the return type of a data constructor and phantom types are data types whose constructors are all phantom constructors.
Note that data constructors like Left
and Right
are also phantom constructors:
data Either a b = Left a | Right b
Left :: a -> Either a b
Right :: b -> Either a b
The reason is that although the return type of these data constructors do depend upon their arguments yet they are still generalized because they only partially depend upon their arguments.
Simple way to know if a data constructor is a phantom constructor:
Do all the type variables appearing in the return type of the data constructor also appear in the arguments of the data constructor? If yes, it's not a phantom constructor.
Hope that helps.
For Ratio D3
specifically, we use rich types like that to drive type-directed code, so e.g. if you have a field somewhere at type Ratio D3
, its editor is dispatched to a text field accepting numeric entries only and showing a precision of 3 digits. This is in contrast, e.g., with newtype Amount = Amount Double
where we don't show decimal digits, but use thousand commas and parse input like '10m' as '10,000,000'.
In the underlying representation, both are still just Double
s.