Why is parser combinator “seq” defined with “bind”

2019-07-07 12:30发布

I am reading this article about parser combinators and didn't understand the following:

They say that using seq (see below) leads to parsers with nested tuples as results, which are messy to manipulate.

 seq :: Parser a -> Parser b -> Parser (a,b)
 p ‘seq‘ q = \inp -> [((v,w),inp’’) | (v,inp’) <- p inp, (w,inp’’) <- q inp’]

In order to avoid this problem of nested tuples they introduce monadic bind and return for parsers and then define seq as follows:

 p ‘seq‘ q = p ‘bind‘ \x -> q ‘bind‘ \y -> result (x,y)

Unfortunately I don not see what the problem of nested tuples is and why the 2-nd implementation of seq is better then the 1-st one. Could you help me to understand it?

2条回答
甜甜的少女心
2楼-- · 2019-07-07 13:06

The first example extendend to the type ((a,b),(c,d,e)):

seq232 ((p,q),(r,s,t) = \inp ->
     [ (((v,w),(x,y,z)),inp’’''')
     | (v, inp’) <- p inp
     , (w, inp’’) <- q inp’
     , (x, inp''') <- r inp''
     , (y, inp'''') <- s inp'''
     , (z, inp''''') <- t imp''''
     ]

The second example extended to the type ((a,b),(c,d,e)):

seq232 ((p,q),(r,s,t)) =
    p ‘bind‘ \v ->
    q ‘bind‘ \w ->
    r `bind` \x ->
    s `bind` \y ->
    t `bind` \z ->
    result ((v,w),(x,y,z))

While it isn't a a lot better, I think you can see that the second is a bit cleaner.

查看更多
Rolldiameter
3楼-- · 2019-07-07 13:22

I dare to say that the applicative interface is actually older than the monadic interface. It is unfortunately only getting more attention in recent years.

The idea is that with the seq combinator (not a fortunate name) the elementary parsers return a value of a "simple" type, which are then combined into a final result which is a more complex type.

The idea of the applicative interface is to have a sequential composition combinator of type

(<*>) :: Parser (b -> a) -> Parser b -> Parser a

We start out by lifting a function (which has a "complicated" type) into a parser which is then used to combine the arguments returns by subsequent parsers into a result. Special versions are <* and *> which discard the result at the side of the missing angle bracket. Together you will find this interface described in the module Control.Applicative.

Why are monadic parsers to be avoided? Since the right hand side parser of a >>= combinator depends on the result of the parser at the left hand side this parser is constructed over and over again during parsing. The message is: only use the monadic interface if it is really needed, i.e. if you want to construct parsers which depend on parsing results of parser which run earlier.

An example of an applicative parser. Suppose we want to recognise nested parentheses and compute the maximum nesting depth. The grammar S -> (S)S | epsilon describes this structure, which is directly reflected in the parser.

pS = (max . (+1)) <$ pSym '(' <*> pS <* pSym ')' <|> pure 0

A typical example of a case where monadic parsers come in handy is in e.g. recognising a^{n}b^{n}c^{n}. We start by recognising a number of a's and then want to see that many b'c and c's

pAs       = length <$> many(pSym 'a')
pSyms c 0 = pure 0
pSyms c n = (+1) <$ pSym c <*> pSyms c (n-1)
pABC = do count <- pAs
          pSyms 'b' count
          pSyms 'c' count

Essentially is that in the next statements in the do-construct the result of earlier statements is being used, and not only at the end to combine them without performing any parsing in the combination step.

See also:

@inproceedings{SwieDupo96,
Author = {Swierstra, S. D. and Duponcheel, L.},
Booktitle = {Advanced Functional Programming},
Date-Added = {2009-01-04 17:21:54 +0100},
Date-Modified = {2009-01-04 17:21:54 +0100},
Editor = {Launchbury, John and Meijer, Erik and Sheard, Tim},
Pages = {184-207},
Publisher = {Springer-Verlag},
Series = {LNCS-Tutorial},
Title = {Deterministic, Error-Correcting Combinator Parsers},
Urlpdf = {http://www.cs.uu.nl/people/doaitse/Papers/1996/DetErrCorrComPars.pdf},
Volume = {1129},
Year = {1996},
}

which explains in the end why monads should be avoided if possible (just as you should use regular expression when context-free grammars are overkill).

查看更多
登录 后发表回答