Boolean [removed]grammar) parser in c++

2019-01-01 12:37发布

问题:

I want to parse a boolean expression (in C++). Input form:

a and b xor (c and d or a and b);

I just want to parse this expression into a tree, knowing the precedence rule (not,and,xor,or). So the above expression should look something like:

(a and b) xor ((c and d) or (a and b));

to the parser.

And the tree would be of form:

                        a
                   and
                        b
               or
                        c
                   and
                        d
        xor
                   a
              and
                   b

The input will be either through command line or in the form of a string. I just need the parser.

Are there any sources that can help me do this?

回答1:

Here is an implementation based on Boost Spirit.

Because Boost Spirit generates recursive descent parsers based on expression templates, honouring the \'idiosyncratic\' (sic) precedence rules (as mentioned by others) is quite tedious. Therefore the grammar lacks a certain elegance.

Abstract Data Type

I defined a tree data structure using Boost Variant\'s recursive variant support, note the definition of expr:

struct op_or  {}; // tag
struct op_and {}; // tag
struct op_xor {}; // tag
struct op_not {}; // tag

typedef std::string var;
template <typename tag> struct binop;
template <typename tag> struct unop;

typedef boost::variant<var, 
        boost::recursive_wrapper<unop <op_not> >, 
        boost::recursive_wrapper<binop<op_and> >,
        boost::recursive_wrapper<binop<op_xor> >,
        boost::recursive_wrapper<binop<op_or> >
        > expr;

(full source below)

Grammar Rules

The following is the (slightly tedious) grammar definition, as mentioned.

Although I don\'t consider this grammar optimal, it is quite readable, and we have ourselves a statically compiled parser with strongly typed AST datatype in roughly 50 lines of code. Things could be considerably worse.

template <typename It, typename Skipper = qi::space_type>
    struct parser : qi::grammar<It, expr(), Skipper>
{
    parser() : parser::base_type(expr_)
    {
        using namespace qi;
        expr_  = or_.alias();

        or_  = (xor_ >> \"or\"  >> xor_) [ _val = phx::construct<binop<op_or >>(_1, _2) ] | xor_   [ _val = _1 ];
        xor_ = (and_ >> \"xor\" >> and_) [ _val = phx::construct<binop<op_xor>>(_1, _2) ] | and_   [ _val = _1 ];
        and_ = (not_ >> \"and\" >> not_) [ _val = phx::construct<binop<op_and>>(_1, _2) ] | not_   [ _val = _1 ];
        not_ = (\"not\" > simple       ) [ _val = phx::construct<unop <op_not>>(_1)     ] | simple [ _val = _1 ];

        simple = ((\'(\' > expr_ > \')\') | var_);
        var_ = qi::lexeme[ +alpha ];
    }

  private:
    qi::rule<It, var() , Skipper> var_;
    qi::rule<It, expr(), Skipper> not_, and_, xor_, or_, simple, expr_;
};

Operating on the syntax tree

Obviously, you\'d want to evaluate the expressions. For now, I decided to stop at just printing, so I don\'t have to do the lookup table for named variables :)

Traversing a recursive variant may look cryptic at first, but the boost::static_visitor<> is surprisingly simple once you get the hang of it:

struct printer : boost::static_visitor<void>
{
    printer(std::ostream& os) : _os(os) {}
    std::ostream& _os;

    //
    void operator()(const var& v) const { _os << v; }

    void operator()(const binop<op_and>& b) const { print(\" & \", b.oper1, b.oper2); }
    void operator()(const binop<op_or >& b) const { print(\" | \", b.oper1, b.oper2); }
    void operator()(const binop<op_xor>& b) const { print(\" ^ \", b.oper1, b.oper2); }

    void print(const std::string& op, const expr& l, const expr& r) const
    {
        _os << \"(\";
            boost::apply_visitor(*this, l);
            _os << op;
            boost::apply_visitor(*this, r);
        _os << \")\";
    }

    void operator()(const unop<op_not>& u) const
    {
        _os << \"(\";
            _os << \"!\";
            boost::apply_visitor(*this, u.oper1);
        _os << \")\";
    }
};

std::ostream& operator<<(std::ostream& os, const expr& e)
{ boost::apply_visitor(printer(os), e); return os; }

Test output:

For the test cases in the code, the following is output, demonstrating correct handling of the precedence rules by adding (redundant) parentheses:

result: ((a & b) ^ ((c & d) | (a & b)))
result: ((a & b) ^ ((c & d) | (a & b)))
result: (a & b)
result: (a | b)
result: (a ^ b)
result: (!a)
result: ((!a) & b)
result: (!(a & b))
result: (a | (b | c))

Full Code:

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/variant/recursive_wrapper.hpp>

namespace qi    = boost::spirit::qi;
namespace phx   = boost::phoenix;

struct op_or  {};
struct op_and {};
struct op_xor {};
struct op_not {};

typedef std::string var;
template <typename tag> struct binop;
template <typename tag> struct unop;

typedef boost::variant<var, 
        boost::recursive_wrapper<unop <op_not> >, 
        boost::recursive_wrapper<binop<op_and> >,
        boost::recursive_wrapper<binop<op_xor> >,
        boost::recursive_wrapper<binop<op_or> >
        > expr;

template <typename tag> struct binop 
{ 
    explicit binop(const expr& l, const expr& r) : oper1(l), oper2(r) { }
    expr oper1, oper2; 
};

template <typename tag> struct unop  
{ 
    explicit unop(const expr& o) : oper1(o) { }
    expr oper1; 
};

struct printer : boost::static_visitor<void>
{
    printer(std::ostream& os) : _os(os) {}
    std::ostream& _os;

    //
    void operator()(const var& v) const { _os << v; }

    void operator()(const binop<op_and>& b) const { print(\" & \", b.oper1, b.oper2); }
    void operator()(const binop<op_or >& b) const { print(\" | \", b.oper1, b.oper2); }
    void operator()(const binop<op_xor>& b) const { print(\" ^ \", b.oper1, b.oper2); }

    void print(const std::string& op, const expr& l, const expr& r) const
    {
        _os << \"(\";
            boost::apply_visitor(*this, l);
            _os << op;
            boost::apply_visitor(*this, r);
        _os << \")\";
    }

    void operator()(const unop<op_not>& u) const
    {
        _os << \"(\";
            _os << \"!\";
            boost::apply_visitor(*this, u.oper1);
        _os << \")\";
    }
};

std::ostream& operator<<(std::ostream& os, const expr& e)
{ boost::apply_visitor(printer(os), e); return os; }

template <typename It, typename Skipper = qi::space_type>
    struct parser : qi::grammar<It, expr(), Skipper>
{
    parser() : parser::base_type(expr_)
    {
        using namespace qi;

        expr_  = or_.alias();

        or_  = (xor_ >> \"or\"  >> or_ ) [ _val = phx::construct<binop<op_or >>(_1, _2) ] | xor_   [ _val = _1 ];
        xor_ = (and_ >> \"xor\" >> xor_) [ _val = phx::construct<binop<op_xor>>(_1, _2) ] | and_   [ _val = _1 ];
        and_ = (not_ >> \"and\" >> and_) [ _val = phx::construct<binop<op_and>>(_1, _2) ] | not_   [ _val = _1 ];
        not_ = (\"not\" > simple       ) [ _val = phx::construct<unop <op_not>>(_1)     ] | simple [ _val = _1 ];

        simple = ((\'(\' > expr_ > \')\') | var_);
        var_ = qi::lexeme[ +alpha ];

        BOOST_SPIRIT_DEBUG_NODE(expr_);
        BOOST_SPIRIT_DEBUG_NODE(or_);
        BOOST_SPIRIT_DEBUG_NODE(xor_);
        BOOST_SPIRIT_DEBUG_NODE(and_);
        BOOST_SPIRIT_DEBUG_NODE(not_);
        BOOST_SPIRIT_DEBUG_NODE(simple);
        BOOST_SPIRIT_DEBUG_NODE(var_);
    }

  private:
    qi::rule<It, var() , Skipper> var_;
    qi::rule<It, expr(), Skipper> not_, and_, xor_, or_, simple, expr_;
};

int main()
{
    for (auto& input : std::list<std::string> {
            // From the OP:
            \"(a and b) xor ((c and d) or (a and b));\",
            \"a and b xor (c and d or a and b);\",

            /// Simpler tests:
            \"a and b;\",
            \"a or b;\",
            \"a xor b;\",
            \"not a;\",
            \"not a and b;\",
            \"not (a and b);\",
            \"a or b or c;\",
            })
    {
        auto f(std::begin(input)), l(std::end(input));
        parser<decltype(f)> p;

        try
        {
            expr result;
            bool ok = qi::phrase_parse(f,l,p > \';\',qi::space,result);

            if (!ok)
                std::cerr << \"invalid input\\n\";
            else
                std::cout << \"result: \" << result << \"\\n\";

        } catch (const qi::expectation_failure<decltype(f)>& e)
        {
            std::cerr << \"expectation_failure at \'\" << std::string(e.first, e.last) << \"\'\\n\";
        }

        if (f!=l) std::cerr << \"unparsed: \'\" << std::string(f,l) << \"\'\\n\";
    }

    return 0;
}

Bonus:

For bonus points, to get a tree exactly like shown in the OP:

static const char indentstep[] = \"    \";

struct tree_print : boost::static_visitor<void>
{
    tree_print(std::ostream& os, const std::string& indent=indentstep) : _os(os), _indent(indent) {}
    std::ostream& _os;
    std::string _indent;

    void operator()(const var& v) const { _os << _indent << v << std::endl; }

    void operator()(const binop<op_and>& b) const { print(\"and \", b.oper1, b.oper2); }
    void operator()(const binop<op_or >& b) const { print(\"or  \", b.oper2, b.oper1); }
    void operator()(const binop<op_xor>& b) const { print(\"xor \", b.oper2, b.oper1); }

    void print(const std::string& op, const expr& l, const expr& r) const
    {
        boost::apply_visitor(tree_print(_os, _indent+indentstep), l);
        _os << _indent << op << std::endl;
        boost::apply_visitor(tree_print(_os, _indent+indentstep), r);
    }

    void operator()(const unop<op_not>& u) const
    {
        _os << _indent << \"!\";
        boost::apply_visitor(tree_print(_os, _indent+indentstep), u.oper1);
    }
};

std::ostream& operator<<(std::ostream& os, const expr& e)
{ 
    boost::apply_visitor(tree_print(os), e); return os; 
}

result:

            a
        and 
            b
    or  
            c
        and 
            d
xor 
        a
    and 
        b


回答2:

Either use a parser generator as Oli Charlesworth already mentioned (yacc, bison, antlr; the latter is in my experience better suited for C++ than the other two although it is a while that I looked at any of them) or create a simple recursive descent parser: for a language as simple as yours this may be the easier approach.



回答3:

See my SO answer on how to code simple recursive descent parsers.

This approach is very convenient for simple languages such as Boolean expressions. And the concepts are pretty much independent of your programming language.



回答4:

If, like me, you find the overhead and idiosyncrasies of the parsing libraries too much for such a little job, you can very easily write your own parser for a simple scenario like the one you present. See here for a parser I wrote in C# to parse simple C# expressions along similar lines to your requirements.



回答5:

Have a look at the the Mini C example code https://github.com/boostorg/spirit/tree/master/example/qi/compiler_tutorial/mini_c.

Especially have a look at the expression.cpp, expression.hpp, expression_def.hpp, and ast.hpp. It gives a great example on how to parse expressions into an AST.