“Pattern matching” of algebraic type data construc

2020-06-02 10:37发布

问题:

Let's consider a data type with many constructors:

data T = Alpha Int | Beta Int | Gamma Int Int | Delta Int

I want to write a function to check if two values are produced with the same constructor:

sameK (Alpha _) (Alpha _) = True
sameK (Beta _) (Beta _) = True
sameK (Gamma _ _) (Gamma _ _) = True
sameK _ _ = False

Maintaining sameK is not much fun, it cannot be checked for correctness easily. For example, when new constructors are added to T, it's easy to forget to update sameK. I omitted one line to give an example:

-- it’s easy to forget:
-- sameK (Delta _) (Delta _) = True

The question is how to avoid boilerplate in sameK? Or how to make sure it checks for all T constructors?


The workaround I found is to use separate data types for each of the constructors, deriving Data.Typeable, and declaring a common type class, but I don't like this solution, because it is much less readable and otherwise just a simple algebraic type works for me:

{-# LANGUAGE DeriveDataTypeable #-}

import Data.Typeable

class Tlike t where
  value :: t -> t
  value = id

data Alpha = Alpha Int deriving Typeable
data Beta = Beta Int deriving Typeable
data Gamma = Gamma Int Int deriving Typeable
data Delta = Delta Int deriving Typeable

instance Tlike Alpha
instance Tlike Beta
instance Tlike Gamma
instance Tlike Delta

sameK :: (Tlike t, Typeable t, Tlike t', Typeable t') => t -> t' -> Bool
sameK a b = typeOf a == typeOf b

回答1:

Look at the Data.Data module, the toConstr function in particular. Along with {-# LANGUAGE DeriveDataTypeable #-} that will get you a 1-line solution which works for any type which is an instance of Data.Data. You don't need to figure out all of SYB!

If, for some reason (stuck with Hugs?), that is not an option, then here is a very ugly and very slow hack. It works only if your datatype is Showable (e.g. by using deriving (Show) - which means no function types inside, for example).

constrT :: T -> String
constrT = head . words . show
sameK x y = constrT x == constrT y

constrT gets the string representation of the outermost constructor of a T value by showing it, chopping it up into words and then getting the first. I give an explicit type signature so you're not tempted to use it on other types (and to evade the monomorphism restriction).

Some notable disadvantages:

  • This breaks horribly when your type has infix constructors (such as data T2 = Eta Int | T2 :^: T2)
  • If some of your constructors have a shared prefix, this is going to get slower, as a larger part of the strings has to be compared.
  • Doesn't work on types with a custom show, such as many library types.

That said, it is Haskell 98... but that's about the only nice thing I can say about it!



回答2:

Another possible way:

sameK x y = f x == f y
  where f (Alpha _)   = 0
        f (Beta _)    = 1
        f (Gamma _ _) = 2
        -- runtime error when Delta value encountered

A runtime error is not ideal, but better than silently giving the wrong answer.



回答3:

You'll need to use a generics library like Scrap Your Boilerplate or uniplate to do this in general.

If you don't want to be so heavy-handed, you can use Dave Hinton's solution, together with the empty record shortcut:

...
where f (Alpha {}) = 0
      f (Beta {}) = 1
      f (Gamma {}) = 2

So you don't have to know how many args each constructor has. But it obviously still leaves something to be desired.



回答4:

In some cases, "Scrap Your Boilerplate" library will help.

http://www.haskell.org/haskellwiki/Scrap_your_boilerplate



回答5:

You can definitely use the generics to eliminate the boilerplate. Your code is a textbook example why I (and many others never use the _ wildcard at top level). While it is tedious to write out all the cases, it is less tedious than dealing with the bugs.

In this happy example I would not only use Dave Hinton's solution but would slap an INLINE pragma on the auxiliary function f.