Build parser from grammar at runtime

2020-02-25 23:18发布

问题:

Many (most) regular expression libraries for C++ allow for creating the expression from a string during runtime. Is anyone aware of any C++ parser generators that allow for feeding a grammar (preferably BNF) represented as a string into a generator at runtime? All the implementations I've found either require an explicit code generator to be run or require the grammar to be expressed via clever template meta-programming.

回答1:

It should be pretty easy to build a recursive descent, backtracking parser that accepts a grammar as input. You can reduce all your rules to the following form (or act as if you have):

 A = B C D ;

Parsing such a rule by recursive descent is easy: call a routine that corresponds to finding a B, then one that finds a C, then one that finds a D. Given you are doing a general parser, you can always call a "parse_next_sentential_form(x)" function, and pass the name of the desired form (terminal or nonterminal token) as x (e.g., "B", "C", "D").

In processing such a rule, the parser wants to produce an A, by finding a B, then C, then D. To find B (or C or D), you'd like to have an indexed set of rules in which all the left-hand sides are the same, so one can easily enumerate the B-producing rules, and recurse to process their content. If your parser gets a failure, it simply backtracks.

This won't be a lightning fast parser, but shouldn't be terrible if well implemented.

One could also use an Earley parser, which parses by creating states of partially-processed rules.

If you wanted it to be fast, I suppose you could simply take the guts of Bison and make it into a library. Then if you have grammar text or grammar rules (different entry points into Bison), you could start it and have it produce its tables in memory (which it must do in some form). Don't spit them out; simply build an LR parsing engine that uses them. Voila, on-the-fly efficient parser generation. You have to worry about ambiguities and the LALR(1)ness of your grammar if you do this; the previous two solutions work with any context free grammar.



回答2:

I am not aware of an existing library for this. However if performance and robustness are not critical, then you can spin off bison or any other tool that generates C code (via popen(3) or similar), spin off gcc on the generated code, link it into shared library and load the library via dlopen(3)/dlsym(3). On Windows -- DLL and LoadLibrary() instead.



回答3:

The easiest option is to embed some scripting language or even a full-blown VM (e.g., Mono), and run your generated parsers on top of it. Lua has quite a powerful JIT compiler, decent metaprogramming capabilities and several Packrat implementations ready to use, so probably it would be the least effort way.



回答4:

I just came across this http://cocom.sourceforge.net/ammunition++-13.html
The last one is an Earley Parser and it appears to take the grammar as a string.
One of the functions is:

Public function `parse_grammar'

         `int parse_grammar (int strict_p, const char *description)'

is another function which tunes the parser to given grammar. The grammar is given by string `description'. The description is similiar YACC one.

The actual code is at http://sourceforge.net/projects/cocom/

EDIT

A newer version is at https://github.com/vnmakarov/yaep



回答5:

boost::spirit is a C++ parsing framework that can be used to construct parsers dynamically at runtime.