I've written a lexer in Alex and I'm trying to hook it up to a parser written in Happy. I'll try my best to summarize my problem without pasting huge chunks of code.
I know from my unit tests of my lexer that the string "\x7"
is lexed to:
[TokenNonPrint '\x7', TokenEOF]
My token type (spit out by the lexer), is Token
. I've defined lexWrap
and alexEOF
as described here, which gives me the following header and token declarations:
%name parseTokens
%tokentype { Token }
%lexer { lexWrap } { alexEOF }
%monad { Alex }
%error { parseError }
%token
NONPRINT {TokenNonPrint $$}
PLAIN { TokenPlain $$ }
I invoke the parser+lexer combo with the following:
parseExpr :: String -> Either String [Expr]
parseExpr s = runAlex s parseTokens
And here are my first few productions:
exprs :: { [Expr] }
exprs
: {- empty -} { trace "exprs 30" [] }
| exprs expr { trace "exprs 31" $ $2 : $1 }
nonprint :: { Cmd }
: NONPRINT { NonPrint $ parseNonPrint $1}
expr :: { Expr }
expr
: nonprint {trace "expr 44" $ Cmd $ $1}
| PLAIN { trace "expr 37" $ Plain $1 }
I'll leave out the datatype declarations of Expr
and NonPrint
since they're long and only the constructors Cmd
and NonPrint
matter here. The function parseNonPrint
is defined at the bottom of Parse.y as:
parseNonPrint :: Char -> NonPrint
parseNonPrint '\x7' = Bell
Also, my error handling function looks like:
parseError :: Token -> Alex a
parseError tokens = error ("Error processing token: " ++ show tokens)
Written like this, I expect the following hspec test to pass:
parseExpr "\x7" `shouldBe` Right [Cmd (NonPrint Bell)]
But instead, I see "exprs 30"
print once (even though I'm running 5 different unit tests) and all of my tests of parseExpr
return Right []
. I don't understand why that would be the case, but I changed the exprs
production to prevent it:
exprs :: { [Expr] }
exprs
: expr { trace "exprs 30" [$1] }
| exprs expr { trace "exprs 31" $ $2 : $1 }
Now all of my tests fail on the first token they hit --- parseExpr "\x7"
fails with:
uncaught exception: ErrorCall (Error processing token: TokenNonPrint '\a')
And I'm thoroughly confused, since I would expect the parser to take the path exprs -> expr -> nonprint -> NONPRINT
and succeed. I don't see why this input would put the parser in an error state. None of the trace
statements are hit (optimized away?).
What am I doing wrong?
It turns out the cause of this error was the innocuous line
which was recommended by the linked question about using Alex with Happy (unfortunately, one of the top Google results for queries like "using Alex as a monadic lexer with Happy). The fix is to change it to the following:
I had to dig in to the generated code to uncover the issue. It is caused by the code derived from the
%tokens
directive, which looks as follows (I commented out all of my token declarations except forTokenNonPrint
while trying to track down the error):Evidently, Happy transforms each line of the
%tokens
directive in to one branch of a pattern match. It also inserts a branch for whatever was identified to it as the EOF token in the%lexer
directive.By inserting the name of a value,
alexEOF
, rather than a data constructor,TokenEOF
, this branch of the case statement has the effect of re-binding the namealexEOF
to whatever token was passed in tolexWrap
, shadowing the original binding and short-circuiting the case statement so that it hits the EOF rule every time, which somehow results in Happy entering an error state.The mistake isn't caught by the type system, since the identifier
alexEOF
(orTokenEOF
) doesn't appear anywhere else in the generated code. Misusing the%lexer
directive like this will cause GHC to emit a warning, but, since the warning appears in generated code, it's impossible to distinguish it from all of the other harmless warnings the code throws out.