What causes Happy to throw a parse error?

2019-05-19 01:33发布

I've written a lexer in Alex and I'm trying to hook it up to a parser written in Happy. I'll try my best to summarize my problem without pasting huge chunks of code.

I know from my unit tests of my lexer that the string "\x7" is lexed to:

[TokenNonPrint '\x7', TokenEOF]

My token type (spit out by the lexer), is Token. I've defined lexWrap and alexEOF as described here, which gives me the following header and token declarations:

%name parseTokens 
%tokentype { Token }
%lexer { lexWrap } { alexEOF }
%monad { Alex }
%error { parseError }

%token
  NONPRINT {TokenNonPrint $$}
  PLAIN { TokenPlain $$ }

I invoke the parser+lexer combo with the following:

parseExpr :: String -> Either String [Expr]
parseExpr s = runAlex s parseTokens

And here are my first few productions:

exprs :: { [Expr] }
exprs
  : {- empty -} { trace "exprs 30" [] }
  | exprs expr { trace "exprs 31" $ $2 : $1 }

nonprint :: { Cmd }
  : NONPRINT { NonPrint $ parseNonPrint $1}

expr :: { Expr }
expr
  : nonprint {trace "expr 44" $ Cmd $ $1}
  | PLAIN { trace "expr 37" $ Plain $1 }

I'll leave out the datatype declarations of Expr and NonPrint since they're long and only the constructors Cmd and NonPrint matter here. The function parseNonPrint is defined at the bottom of Parse.y as:

parseNonPrint :: Char -> NonPrint
parseNonPrint '\x7' = Bell

Also, my error handling function looks like:

parseError :: Token -> Alex a
parseError tokens = error ("Error processing token: " ++ show tokens)

Written like this, I expect the following hspec test to pass:

parseExpr "\x7" `shouldBe` Right [Cmd (NonPrint Bell)]

But instead, I see "exprs 30" print once (even though I'm running 5 different unit tests) and all of my tests of parseExpr return Right []. I don't understand why that would be the case, but I changed the exprs production to prevent it:

exprs :: { [Expr] }
exprs
  : expr { trace "exprs 30" [$1] }
  | exprs expr { trace "exprs 31" $ $2 : $1 }

Now all of my tests fail on the first token they hit --- parseExpr "\x7" fails with:

uncaught exception: ErrorCall (Error processing token: TokenNonPrint '\a')

And I'm thoroughly confused, since I would expect the parser to take the path exprs -> expr -> nonprint -> NONPRINT and succeed. I don't see why this input would put the parser in an error state. None of the trace statements are hit (optimized away?).

What am I doing wrong?

1条回答
来,给爷笑一个
2楼-- · 2019-05-19 02:11

It turns out the cause of this error was the innocuous line

%lexer { lexWrap } { alexEOF }

which was recommended by the linked question about using Alex with Happy (unfortunately, one of the top Google results for queries like "using Alex as a monadic lexer with Happy). The fix is to change it to the following:

%lexer { lexWrap } { TokenEOF }

I had to dig in to the generated code to uncover the issue. It is caused by the code derived from the %tokens directive, which looks as follows (I commented out all of my token declarations except for TokenNonPrint while trying to track down the error):

happyNewToken action sts stk
    = lexWrap(\tk -> 
    let cont i = happyDoAction i tk action sts stk in
    case tk of {
    alexEOF -> happyDoAction 2# tk action sts stk; -- !!!!
    TokenNonPrint happy_dollar_dollar -> cont 1#;
    _ -> happyError' tk
    })

Evidently, Happy transforms each line of the %tokens directive in to one branch of a pattern match. It also inserts a branch for whatever was identified to it as the EOF token in the %lexer directive.

By inserting the name of a value, alexEOF, rather than a data constructor, TokenEOF, this branch of the case statement has the effect of re-binding the name alexEOF to whatever token was passed in to lexWrap, shadowing the original binding and short-circuiting the case statement so that it hits the EOF rule every time, which somehow results in Happy entering an error state.

The mistake isn't caught by the type system, since the identifier alexEOF (or TokenEOF) doesn't appear anywhere else in the generated code. Misusing the %lexer directive like this will cause GHC to emit a warning, but, since the warning appears in generated code, it's impossible to distinguish it from all of the other harmless warnings the code throws out.

查看更多
登录 后发表回答