Haskell: Graham Hutton Book Parsing (Ch-8): What d

2019-07-15 02:13发布

My question is about Graham Hutton's book Programming in Haskell 1st Ed.

There is a parser created in section 8.4, and I am assuming anyone answering has the book or can see the link to slide 8 in the link above.

A basic parser called item is described as:

type Parser a = String -> [(a, String)]

item :: Parser Char

item = \inp -> case inp of
        [] -> []
        (x:xs) -> [(x,xs)]

which is used with do to define another parser p (the do parser)

p :: Parser (Char, Char)

p = do x <- item
       item
       y <- item
       return (x,y)

the relevant bind definition is:

(>>=) :: Parser a -> (a -> Parser b) -> Parser b

p >>= f = \inp -> case parse p inp of
                       [] -> []
                       [(v,out)] -> parse (f v) out

return is defined as:

return  :: a -> Parser a

return v = \inp -> [(v,inp)]

parse is defined as:

parse :: Parser a -> String -> [(a,String)]

parse p inp = p inp

The program (the do parser) takes a string and selects the 1st and 3rd characters and returns them in a tuple with the remainder of the string in a list, e.g., "abcdef" produces [('a','c'), "def"].

I want to know how the (f v) out in [(v,out)] -> parse (f v) out returns a parser which is then applied to out.

  • f in the do parser is item and item taking a character 'c' returns [('c',[])]?

  • How can that be a parser and how can it take out as an argument?

Perhaps I am just not understanding what (f v) does.

  • Also how does the do parser 'drop' the returned values each time to operate on the rest of the input string when item is called again?

  • What is the object that works its way through the do parser, and how is it altered at each step, and by what means is it altered?

3条回答
放我归山
2楼-- · 2019-07-15 02:14

The relevant advice is, Don't panic (meaning, don't rush it; or, take it slow), and, Follow the types.

First of all, Parsers

type Parser a = String -> [(a,String)]

are functions from String to lists of pairings of result values of type a and the leftover Strings. That leftovers string will be used as input for next parsing step. That's the main thing about it here.

You are asking, in

p >>= f = \inp -> case (parse p inp) of
                       [] -> []
                       [(v,out)] -> parse (f v) out

how the (f v) in [(v,out)] -> parse (f v) out returns a parser which is then applied to out?

The answer is, f's type says that it does so:

(>>=) :: Parser a -> (a -> Parser b) -> Parser b    -- or, the equivalent
(>>=) :: Parser a -> (a -> Parser b) -> (String -> [(b,String)])
--       p              f                inp

We have f :: a -> Parser b, so that's just what it does: applied to a value of type a it returns a value of type Parser b. Or equivalently,

f    :: a       -> (String  -> [(b,String)])      -- so that
f (v :: a)      ::  String  -> [(b,String)]       -- and,
f (v :: a) (out ::  String) :: [(b,String)]  

So whatever is the value that parse p inp produces, it must be what f is waiting for to proceed. The types must "fit":

p ::       Parser a                     --   m  a
f ::              a -> Parser b         --      a -> m  b
f <$> p :: Parser    ( Parser b )       --   m     ( m  b )
p >>= f :: Parser             b         --   m          b

or, equivalently,

p ::       String -> [(a,   String)]
--         inp         v    out
f ::                   a -> String   -> [(b, String)]
--                     v    out
p >>= f :: String                    -> [(b, String)]     -- a combined Parser
--         inp                            v2  out2

So this also answers your second question,

How can that be a parser and how can it take out as an argument?

The real question is, what kind of f is it, that does such a thing? Where does it come from? And that's your fourth question.

And the answer is, your example in do-notation,

p :: Parser (Char, Char)

p = do x <- item
       _ <- item
       y <- item
       return (x,y)

by Monad laws is equivalent to the nested chain

p = do { x <- item
       ; do { _ <- item
            ; do { y <- item
                 ; return (x,y) }}}

which is a syntactic sugar for the nested chain of Parser bind applications,

p :: Parser (Char, Char) -- ~ String -> [((Char,Char), String)]

p = item >>= (\ x ->              -- item :: Parser Char ~ String -> [(Char,String)] 
               item >>= (\ _ ->                      -- x :: Char
                          item >>= (\ y ->           -- y :: Char
                                     return (x,y) )))

and it is because the functions are nested that the final return has access to both y and x there; and it is precisely the Parser bind that arranges for the output leftovers string to be used as input to the next parsing step:

p = item >>= f     -- :: String -> [((Char,Char), String)]
    where    
           { f x = item >>= f2
                    where { f2 _ = item >>= f3
                                    where { f3 y = return (x,y) }}}

i.e. (under the assumption that inp is a string of length two or longer),

parse p inp                            -- assume that `inp`'s 
  = (item >>= f) inp                   --  length is at least 2     NB.
  =
    let  [(v, left)] = item inp        -- by the def of >>=
    in                                
        (f v) left
  =
    let  [(v, left)] = item inp
    in
       let  x = v                      -- inline the definition of `f`
       in  (item >>= f2) left
  =
    let  [(v, left)] = item inp
    in let  x = v 
       in let  [(v2, left2)] = item left     -- by the def of >>=, again
          in (f2 v2) left2
  =
    ..........
  =
    let  [(x,left1)] = item inp        -- x <- item
         [(_,left2)] = item left1      -- _ <- item
         [(y,left3)] = item left2      -- y <- item
    in 
        [((x,y), left3)]
  =
    let   (x:left1)  = inp             -- inline the definition
          (_:left2)  = left1           -- of `item`
          (y:left3)  = left2
    in 
        [((x,y), left3)]
  =
    let   (x:_:y:left3) = inp
    in 
        [((x,y), left3)]

after few simplifications.

And this answers your third question.

查看更多
来,给爷笑一个
3楼-- · 2019-07-15 02:31

I am having similar problems reading the syntax, because it's not what we are used to.

(>>=) :: Parser a -> (a -> Parser b) -> Parser b

p >>= f = \inp -> case parse p inp of
                       [] -> []
                       [(v,out)] -> parse (f v) out

so for the question: I want to know how the (f v) out in [(v,out)] -> parse (f v) out returns a parser which is then applied to out.

It does because thats the siganture of the 2nd arg (the f): (>>=) :: Parser a -> (a -> Parser b) -> Parser b .... f takes an a and produces a Parser b. a Parser b takes a String which is the out ... (f v) out

But the output of this should not be mixed up with the output of the function we are writing: >>=

  • We are outputing a parser ... (>>=) :: Parser a -> (a -> Parser b) -> Parser b .
  • The Parser we are outputing has the job of wrapping and chaining the first 2 args

A parser is a function that takes 1 arg. This is constructed right after the first = ... ie by returning an (anonymous) function: p >>= f = \inp -> ... so inp refers to the input string of the Parser we are building

so what is left is to define what that constructed function should do ... NOTE: we are not implementing any of the input parsers just chaining them together ... so the ouput Parser function should:

  • apply the input parser (p) to the its input (inp): p >>= f = \inp -> case parse p inp of
  • take the output of that parse [(v, out)] v is the result out is what remains of the input
  • apply the input function (f is (a -> Parser b)) to the parsed result (v)
  • f produces a Parser b (a function that takes 1 arg)
  • so apply that output parser to the remainder of the first parser (out)

For me the understanding lies in the use of destructuring and the realisation that we are constructing a function that glues together the execution of other functions together simply considering their interface

Hope that helps ... it helped me to write it :-)

查看更多
The star\"
4楼-- · 2019-07-15 02:33

f v produces a Parser b because f is a function of type a -> Parser b and v is a value of type a. So then you're calling parse with this Parser b and the string out as arguments.

F in the 'do' parser is item

No, it's not. Let's consider a simplified (albeit now somewhat pointless) version of your parser:

p = do x <- item
       return x

This will desugar to:

p = item >>= \x -> return x

So the right operand of >>=, i.e. f, is \x -> return x, not item.

Also how does the 'do' parser 'drop' the returned values each time to operate on the rest of the input string when item is called again? What is the object that works its way through the 'do' parser and how is it altered and each step and by what means is it altered?

When you apply a parser it returns a tuple containing the parsed value and a string representing the rest of the input. If you look at item for example, the second element of the tuple will be xs which is the tail of the input string (i.e. a string containing all characters of the input string except the first). This second part of the tuple will be what's fed as the new input to subsequent parsers (as per [(v,out)] -> parse (f v) out), so that way each successive parser will take as input the string that the previous parser produced as the second part of its output tuple (which will be a suffix of its input).


In response to your comments:

When you write "p = item >>= \x -> return x", is that the equivalent of just the first line "p = do x <- item"?

No, it's equivalent to the entire do-block (i.e. do {x <- item; return x}). You can't translate do-blocks line-by-line like that. do { x <- foo; rest } is equivalent to foo >>= \x -> do {rest}, so you'll always have the rest of the do-block as part of the right operand of >>=.

but not how that reduces to simply making 'out' available as the input for the next line. What is parse doing if the next line of the 'do' parser is a the item parser?

Let's walk through an example where we invoke item twice (this is like your p, but without the middle item). In the below I'll use === to denote that the expressions above and below the === are equivalent.

do x <- item
   y <- item
   return (x, y)
=== -- Desugaring do
item >>= \x -> item >>= \y -> return (x, y)
=== -- Inserting the definition of >>= for outer >>=
\inp -> case parse item inp of
             [] -> []
             [(v,out)] -> parse (item >>= \y -> return (v, y)) out

Now let's apply this to the input "ab":

case parse item "ab" of
     [] -> []
     [(v,out)] -> parse (item >>= \y -> return (v, y)) out
=== Insert defintiion of `parse`
case item "ab" of
     [] -> []
     [(v,out)] -> parse (item >>= \y -> return (v, y)) out
=== Insert definition of item
case ('a', "b") of
     [] -> []
     [(v,out)] -> parse (item >>= \y -> return (v, y)) out
===
parse (item >>= \y -> return ('a', y)) out

Now we can expand the second >>= the same we did the fist and eventually end up with ('a', 'b').

查看更多
登录 后发表回答