Motivation behind Phantom Types?

2019-01-30 23:24发布

问题:

Don Stewart's Haskell in the Large's presentation mentioned Phantom Types:

data Ratio n = Ratio Double
1.234 :: Ratio D3

data Ask ccy = Ask Double
Ask 1.5123 :: Ask GBP

I read over his bullet points about them, but I did not understand them. In addition, I read the Haskell Wiki on the topic. Yet I still am missing their point.

What's the motivation to use a phantom type?

回答1:

To answer the "what's the motivation to use a phantom type". There is two points:

  • to make invalid states unrepresentable, which is well explained in Aadit's answer
  • Carry some of the information on the type level

For example you could have distances tagged by the length unit:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

newtype Distance a = Distance Double
  deriving (Num, Show)

data Kilometer
data Mile

marathonDistance :: Distance Kilometer
marathonDistance = Distance 42.195

distanceKmToMiles :: Distance Kilometer -> Distance Mile
distanceKmToMiles (Distance km) = Distance (0.621371 * km)

marathonDistanceInMiles :: Distance Mile
marathonDistanceInMiles = distanceKmToMiles marathonDistance

And you can avoid Mars Climate Orbiter disaster:

>>> marathonDistanceInMiles
Distance 26.218749345

>>> marathonDistanceInMiles + marathonDistance

<interactive>:10:27:
    Couldn't match type ‘Kilometer’ with ‘Mile’
    Expected type: Distance Mile
      Actual type: Distance Kilometer
    In the second argument of ‘(+)’, namely ‘marathonDistance’
    In the expression: marathonDistanceInMiles + marathonDistance

There are slight varitions to this "pattern". You can use DataKinds to have closed set of units:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE DataKinds #-}

data LengthUnit = Kilometer | Mile

newtype Distance (a :: LengthUnit) = Distance Double
  deriving (Num, Show)

marathonDistance :: Distance 'Kilometer
marathonDistance = Distance 42.195

distanceKmToMiles :: Distance 'Kilometer -> Distance 'Mile
distanceKmToMiles (Distance km) = Distance (0.621371 * km)

marathonDistanceInMiles :: Distance 'Mile
marathonDistanceInMiles = distanceKmToMiles marathonDistance

And it will work similarly:

>>> marathonDistanceInMiles
Distance 26.218749345

>>> marathonDistance + marathonDistance
Distance 84.39

>>> marathonDistanceInMiles + marathonDistance

<interactive>:28:27:
    Couldn't match type ‘'Kilometer’ with ‘'Mile’
    Expected type: Distance 'Mile
      Actual type: Distance 'Kilometer
    In the second argument of ‘(+)’, namely ‘marathonDistance’
    In the expression: marathonDistanceInMiles + marathonDistance

But now the Distance can be only in kilometers or miles, we can't add more units later. That might be useful in some use cases.


We could also do:

data Distance = Distance { distanceUnit :: LengthUnit, distanceValue :: Double }
   deriving (Show)

In the distance case we can work out the addition, for example translate to kilometers if different units are involved. But this doesn't work well for currencies which ratio isn't constant over time etc.


And it's possible to use GADTs for that instead, which may be simpler approach in some situations:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE StandaloneDeriving #-}

data Kilometer
data Mile

data Distance a where
  KilometerDistance :: Double -> Distance Kilometer
  MileDistance :: Double -> Distance Mile

deriving instance Show (Distance a)

marathonDistance :: Distance Kilometer
marathonDistance = KilometerDistance 42.195

distanceKmToMiles :: Distance Kilometer -> Distance Mile
distanceKmToMiles (KilometerDistance km) = MileDistance (0.621371 * km)

marathonDistanceInMiles :: Distance Mile
marathonDistanceInMiles = distanceKmToMiles marathonDistance

Now we know the unit also on the value level:

>>> marathonDistanceInMiles 
MileDistance 26.218749345

This approach especially greately simplifies Expr a example from Aadit's answer:

{-# LANGUAGE GADTs #-}

data Expr a where
  Number     :: Int -> Expr Int
  Boolean    :: Bool -> Expr Bool
  Increment  :: Expr Int -> Expr Int
  Not        :: Expr Bool -> Expr Bool

It's worth pointing out that the latter variations require non-trivial language extensions (GADTs, DataKinds, KindSignatures), which might not be supported in your compiler. That's might be the case with Mu compiler Don mentions.



回答2:

The motivation behind using phantom types is to specialize the return type of data constructors. For example, consider:

data List a = Nil | Cons a (List a)

The return type of both Nil and Cons is List a by default (which is generalized for all lists of type a).

Nil  ::                List a
Cons :: a -> List a -> List a
                       |____|
                          |
                    -- return type is generalized

Also note that Nil is a phantom constructor (i.e. its return type doesn't depend upon its arguments, vacuously in this case, but nonetheless the same).

Because Nil is a phantom constructor we can specialize Nil to any type we want (e.g. Nil :: List Int or Nil :: List Char).


Normal algebraic data types in Haskell allow you to choose the type of the arguments of a data constructor. For example, we chose the type of arguments for Cons above (a and List a).

However, it doesn't allow you to choose the return type of a data constructor. The return type is always generalized. This is fine for most cases. However, there are exceptions. For example:

data Expr a = Number     Int
            | Boolean    Bool
            | Increment (Expr Int)
            | Not       (Expr Bool)

The type of the data constructors are:

Number    :: Int       -> Expr a
Boolean   :: Bool      -> Expr a
Increment :: Expr Int  -> Expr a
Not       :: Expr Bool -> Expr a

As you can see, the return type of all the data constructors are generalized. This is problematic because we know that Number and Increment must always return an Expr Int and Boolean and Not must always return an Expr Bool.

The return types of the data constructors are wrong because they are too general. For example, Number cannot possibly return an Expr a but yet it does. This allows you to write wrong expressions which the type checker won't catch. For example:

Increment (Boolean False) -- you shouldn't be able to increment a boolean
Not       (Number  0)     -- you shouldn't be able to negate a number

The problem is that we can't specify the return type of data constructors.


Notice that all the data constructors of Expr are phantom constructors (i.e. their return type doesn't depend upon their arguments). A data type whose constructors are all phantom constructors is called a phantom type.

Remember that the return type of phantom constructors like Nil can be specialized to any type we want. Hence, we can create smart constructors for Expr as follows:

number    :: Int       -> Expr Int
boolean   :: Bool      -> Expr Bool
increment :: Expr Int  -> Expr Int
not       :: Expr Bool -> Expr Bool

number    = Number
boolean   = Boolean
increment = Increment
not       = Not

Now we can use the smart constructors instead of the normal constructors and our problem is solved:

increment (boolean False) -- error
not       (number  0)     -- error

So phantom constructors are useful when you want to specialize the return type of a data constructor and phantom types are data types whose constructors are all phantom constructors.


Note that data constructors like Left and Right are also phantom constructors:

data Either a b = Left a | Right b

Left  :: a -> Either a b
Right :: b -> Either a b

The reason is that although the return type of these data constructors do depend upon their arguments yet they are still generalized because they only partially depend upon their arguments.

Simple way to know if a data constructor is a phantom constructor:

Do all the type variables appearing in the return type of the data constructor also appear in the arguments of the data constructor? If yes, it's not a phantom constructor.

Hope that helps.



回答3:

For Ratio D3 specifically, we use rich types like that to drive type-directed code, so e.g. if you have a field somewhere at type Ratio D3, its editor is dispatched to a text field accepting numeric entries only and showing a precision of 3 digits. This is in contrast, e.g., with newtype Amount = Amount Double where we don't show decimal digits, but use thousand commas and parse input like '10m' as '10,000,000'.

In the underlying representation, both are still just Doubles.