Consider the following toy grammar and parser:
(* in EBNF:
ap = "a", { "ba" }
bp = ap, "bc"
*)
let ap = sepBy1 (pstring "a") (pstring "b")
let bp = ap .>> (pstring "bc")
let test = run bp "abababc"
I get the following output:
Error in Ln: 1 Col: 7
abababc
^
Expecting: 'a'
Clearly sepBy1
sees the last b
and expects it to lead into another a
, failing when it doesn't find one. Is there a variant of sepBy1
which would backtrack over b
and make this parse succeed? Is there any reason why I shouldn't use that instead?
This is one way to implement such a variant of sepBy1
:
let backtrackingSepBy1 p sep = pipe2 p (many (sep >>? p)) (fun hd tl -> hd::tl)
Avoiding backtracking in your parser grammar usually makes the parser faster, more portable and easier to debug. Hence, if you have the chance to avoid backtracking by restructuring the grammar (without complicating it too much), I'd recommend doing it.
I've been looking through the FParsec source for the sepBy
family of functions, and it looks like there's no way (currently) to do precisely what you're asking for with any of the sepBy
functions. However, you might be able to approximate it well enough if your separator parser is simple.
To approximate what you're looking for, you could try using sepEndBy
, or more likely sepEndBy1
in your case. Its intended use is to consume things like Python syntax for lists, where the final item is allowed to have a trailing comma: [ 1, 2, 3, ]
. Using sepEndBy1
will consume the final separator, because that's what you want in its intended use. In your use case, though, you would have to work around the fact that sepEndBy1
consumes the final separator. E.g.,
// This parser will be either a separator *or* a prefix of the next item
let sharedParser = pstring "b"
let ap = sepEndBy1 (pstring "a") sharedParser
let restOfBp = pstring "c"
let bp = restOfBp <|> (pipe2 sharedParser restOfBp (fun b c -> b + c))
Here, the combining function in pipe2
is simple string concatenation, but in your actual usage scenario it might be different. The key idea is that if you are able to write a simple function to combine your b
parser with your c
parser to produce the bc
result, and if the c
parser is not too complicated, then this will work for you. If c
is extremely complicated but b
is simple, then just reverse the order of the two sides of <|>
, so that b
is tested first and c
is only tested after b
has either succeeded or failed; that way you won't have to apply the c
parser twice, as you would in the sample code I just posted. I wrote it the way I did because that's slightly easier to understand.
If sepEndBy1
does not work for you, because both the b
and c
parsers in your real-life case are too expensive to parse twice, then the only solution I can think of would be to ask for this new sepBy
variant to be added to FParsec as a new feature. My look through the FParsec code suggests that it would be possible to rewrite the implementation to optionally backtrack to before the final separator, but that it would need some significant testing and consideration of edge cases so it wouldn't be a five-minute fix. But if the sepEndBy1
workaround doesn't work for you, then go ahead and submit that feature request.
I'm not sure if the returned value matters, but if not, then the easiest way to go is to transcribe the grammar more closely:
let ap = pstring "a" >>. many (pstring "ba")
let bp = ap .>> pstring "bc"
Note that the pstring "ba"
will not cause the same problem you had, because pstring
cannot consume only part of its argument; either it consumes a full "ba"
, or it fails without consuming anything and therefore bp
can successfully see a b
if there is one.
Another possibility , if this is indeed the full grammar (ie ap
is not used in another production elsewhere), is to change it to the following EBNF, which is equivalent:
ap = "ab", { "ab" }
cp = ap, "c"
This makes it easier to implement with FParsec:
let ap = many1 (pstring "ab")
let cp = ap .>> pstring "c"