How to use SmallCheck in Haskell?

2019-02-08 06:44发布

站内文章 / 前沿技术

26 0

冷血范

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am trying to use SmallCheck to test a Haskell program, but I cannot understand how to use the library to test my own data types. Apparently, I need to use the Test.SmallCheck.Series. However, I find the documentation for it extremely confusing. I am interested in both cookbook-style solutions and an understandable explanation of the logical (monadic?) structure. Here are some questions I have (all related):

If I have a data type data Person = SnowWhite | Dwarf Integer, how do I explain to smallCheck that the valid values are Dwarf 1 through Dwarf 7 (or SnowWhite)? What if I have a complicated FairyTale data structure and a constructor makeTale :: [Person] -> FairyTale, and I want smallCheck to make FairyTale-s from lists of Person-s using the constructor?

I managed to make quickCheck work like this without getting my hands too dirty by using judicious applications of Control.Monad.liftM to functions like makeTale. I couldn't figure out a way to do this with smallCheck (please explain it to me!).
What is the relationship between the types Serial, Series, etc.?
(optional) What is the point of coSeries? How do I use the Positive type from SmallCheck.Series?
(optional) Any elucidation of what is the logic behind what should be a monadic expression, and what is just a regular function, in the context of smallCheck, would be appreciated.

If there is there any intro/tutorial to using smallCheck, I'd appreciate a link. Thank you very much!

UPDATE: I should add that the most useful and readable documentation I found for smallCheck is this paper (PDF). I could not find the answer to my questions there on the first look; it is more of a persuasive advertisement than a tutorial.

UPDATE 2: I moved my question about the weird Identity that shows up in the type of Test.SmallCheck.list and other places to a separate question.

回答1:

NOTE: This answer describes pre-1.0 versions of SmallCheck. See this blog post for the important differences between SmallCheck 0.6 and 1.0.

SmallCheck is like QuickCheck in that it tests a property over some part of the space of possible types. The difference is that it tries to exhaustively enumerate a series all of the "small" values instead of an arbitrary subset of smallish values.

As I hinted, SmallCheck's Serial is like QuickCheck's Arbitrary.

Now Serial is pretty simple: a Serial type a has a way (series) to generate a Series type which is just a function from Depth -> [a]. Or, to unpack that, Serial objects are objects we know how to enumerate some "small" values of. We are also given a Depth parameter which controls how many small values we should generate, but let's ignore it for a minute.

instance Serial Bool where series _ = [False, True]
instance Serial Char where series _ = "abcdefghijklmnopqrstuvwxyz"
instance Serial a => Serial (Maybe a) where
  series d = Nothing : map Just (series d)

In these cases we're doing nothing more than ignoring the Depth parameter and then enumerating "all" possible values for each type. We can even do this automatically for some types

instance (Enum a, Bounded a) => Serial a where series _ = [minBound .. maxBound]

This is a really simple way of testing properties exhaustively—literally test every single possible input! Obviously there are at least two major pitfalls, though: (1) infinite data types will lead to infinite loops when testing and (2) nested types lead to exponentially larger spaces of examples to look through. In both cases, SmallCheck gets really large really quickly.

So that's the point of the Depth parameter—it lets the system ask us to keep our Series small. From the documentation, Depth is the

Maximum depth of generated test values

For data values, it is the depth of nested constructor applications.

For functional values, it is both the depth of nested case analysis and the depth of results.

so let's rework our examples to keep them Small.

instance Serial Bool where 
  series 0 = []
  series 1 = [False]
  series _ = [False, True]
instance Serial Char where 
  series d = take d "abcdefghijklmnopqrstuvwxyz"
instance Serial a => Serial (Maybe a) where
  -- we shrink d by one since we're adding Nothing
  series d = Nothing : map Just (series (d-1))

instance (Enum a, Bounded a) => Serial a where series d = take d [minBound .. maxBound]

Much better.

So what's coseries? Like coarbitrary in the Arbitrary typeclass of QuickCheck, it lets us build a series of "small" functions. Note that we're writing the instance over the input type---the result type is handed to us in another Serial argument (that I'm below calling results).

instance Serial Bool where
  coseries results d = [\cond -> if cond then r1 else r2 | 
                        r1 <- results d
                        r2 <- results d]

these take a little more ingenuity to write and I'll actually refer you to use the alts methods which I'll describe briefly below.

So how can we make some Series of Persons? This part is easy

instance Series Person where
  series           d = SnowWhite : take (d-1) (map Dwarf [1..7])
  ...

But our coseries function needs to generate every possible function from Persons to something else. This can be done using the altsN series of functions provided by SmallCheck. Here's one way to write it

 coseries results d = [\person -> 
                         case person of
                           SnowWhite -> f 0
                           Dwarf n   -> f n
                       | f <- alts1 results d ]

The basic idea is that altsN results generates a Series of N-ary function from N values with Serial instances to the Serial instance of Results. So we use it to create a function from [0..7], a previously defined Serial value, to whatever we need, then we map our Persons to numbers and pass 'em in.

So now that we have a Serial instance for Person, we can use it to build more complex nested Serial instances. For "instance", if FairyTale is a list of Persons, we can use the Serial a => Serial [a] instance alongside our Serial Person instance to easily create a Serial FairyTale:

instance Serial FairyTale where
  series = map makeFairyTale . series
  coseries results = map (makeFairyTale .) . coseries results

(the (makeFairyTale .) composes makeFairyTale with each function coseries generates, which is a little confusing)

回答2:

If I have a data type data Person = SnowWhite | Dwarf Integer, how do I explain to smallCheck that the valid values are Dwarf 1 through Dwarf 7 (or SnowWhite)?

First of all, you need to decide which values you want to generate for each depth. There's no single right answer here, it depends on how fine-grained you want your search space to be.

Here are just two possible options:

people d = SnowWhite : map Dwarf [1..7] (doesn't depend on the depth)
people d = take d $ SnowWhite : map Dwarf [1..7] (each unit of depth increases the search space by one element)

After you've decided on that, your Serial instance is as simple as

instance Serial m Person where
    series = generate people

We left m polymorphic here as we don't require any specific structure of the underlying monad.

What if I have a complicated FairyTale data structure and a constructor makeTale :: [Person] -> FairyTale, and I want smallCheck to make FairyTale-s from lists of Person-s using the constructor?

Use cons1:

instance Serial m FairyTale where
  series = cons1 makeTale

What is the relationship between the types Serial, Series, etc.?

Serial is a type class; Series is a type. You can have multiple Series of the same type — they correspond to different ways to enumerate values of that type. However, it may be arduous to specify for each value how it should be generated. The Serial class lets us specify a good default for generating values of a particular type.

The definition of Serial is

class Monad m => Serial m a where
  series   :: Series m a

So all it does is assigning a particular Series m a to a given combination of m and a.

What is the point of coseries?

It is needed to generate values of functional types.

How do I use the Positive type from SmallCheck.Series?

For example, like this:

> smallCheck 10 $ \n -> n^3 >= (n :: Integer)
Failed test no. 5.
there exists -2 such that
  condition is false

> smallCheck 10 $ \(Positive n) -> n^3 >= (n :: Integer)
Completed 10 tests without failure.

Any elucidation of what is the logic behind what should be a monadic expression, and what is just a regular function, in the context of smallCheck, would be appreciated.

When you are writing a Serial instance (or any Series expression), you work in the Series m monad.

When you are writing tests, you work with simple functions that return Bool or Property m.

回答3:

While I think that @tel's answer is an excellent explanation (and I wish smallCheck actually worked the way he describes), the code he provides does not work for me (with smallCheck version 1). I managed to get the following to work...

UPDATE / WARNING: The code below is wrong for a rather subtle reason. For the corrected version, and details, please see this answer to the question mentioned below. The short version is that instead of instance Serial Identity Person one must write instance (Monad m) => Series m Person.

... but I find the use of Control.Monad.Identity and all the compiler flags bizarre, and I have asked a separate question about that.

Note also that while Series Person (or actually Series Identity Person) is not actually exactly the same as functions Depth -> [Person] (see @tel's answer), the function generate :: Depth -> [a] -> Series m a converts between them.

{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses, FlexibleContexts, UndecidableInstances #-}
import Test.SmallCheck
import Test.SmallCheck.Series
import Control.Monad.Identity

data Person = SnowWhite | Dwarf Int

instance Serial Identity Person where
        series = generate (\d -> SnowWhite : take (d-1) (map Dwarf [1..7]))