inconsistent behavior of boost spirit grammar [dup

2019-02-24 21:51发布

问题:

This question already has an answer here:

  • undefined behaviour somewhere in boost::spirit::qi::phrase_parse 1 answer

I have a little grammar that I want to use for a work project. A minimum executable example is:

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-local-typedefs"
#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
#pragma GCC diagnostic ignored "-Wunused-variable"
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/qi_grammar.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#pragma GCC diagnostic pop // pops

#include <iostream>

int main()

{
    typedef  unsigned long long ull;

    std::string curline = "1;2;;3,4;5";
    std::cout << "parsing:  " << curline << "\n";

    namespace qi = boost::spirit::qi;
    auto ids = -qi::ulong_long % ','; // '-' allows for empty vecs.
    auto match_type_res = ids % ';' ;
    std::vector<std::vector<ull> > r;
    qi::parse(curline.begin(), curline.end(), match_type_res, r);

    std::cout << "got:      ";
    for (auto v: r){
        for (auto i : v)
            std::cout << i << ",";
        std::cout << ";";
    }
    std::cout <<"\n";
}

On my personal machine this produces the correct output: parsing: 1;2;;3,4;5 got: 1,;2,;;3,4,;5,;

But at work it produces: parsing: 1;2;;3,4;5 got: 1,;2,;;3,

In other words, it fails to parse the vector of long integers as soon as there's more than one element in it.

Now, I have identified that the work system is using boost 1.56, while my private computer is using 1.57. Is this the cause?

Knowning we have some real spirit experts here on stack overflow, I was hoping someone might know where this issue is coming from, or can at least narrow down the number of things I need to check.

If the boost version is the problem, I can probably convince the company to upgrade, but a workaround would be welcome in any case.

回答1:

You're invoking Undefined Behaviour in your code.

Specifically where you use auto to store a parser expression. The Expression Template contains references to temporaries that become dangling at the end of the containing full-expression¹.

UB implies that anything can happen. Both compilers are right! And the best part is, you will probably see varying behaviour depending on the compiler flags used.

Fix it either by using:

  • qi::copy (or boost::proto::deep_copy before v.1.55 IIRC)
  • use BOOST_SPIRIT_AUTO instead of BOOST_AUTO (mostly helpful iff you also support C++03)
  • use qi::rule<> and qi::grammar<> (the non-terminals) to type-erase and the expressions. This has performance impact too, but also gives more features like

    • recursive rules
    • locals and inherited attributes
    • declared skippers (handy because rules can be implicitly lexeme[] (see here)
    • better code organization.

Note also that Spirit X3 promises to drop there restrictions on the use with auto. It's basically a whole lot more lightweight due to the use of c++14 features. Keep in mind that it's not stable yet.

  • Showing that GCC with -O2 shows undefined results; Live On Coliru

  • The fixed version:

Live On Coliru

//#pragma GCC diagnostic push
//#pragma GCC diagnostic ignored "-Wunused-local-typedefs"
//#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
//#pragma GCC diagnostic ignored "-Wunused-variable"
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/qi.hpp>
//#pragma GCC diagnostic pop // pops

#include <iostream>

int main() {
    typedef  unsigned long long ull;

    std::string const curline = "1;2;;3,4;5";
    std::cout << "parsing: '" << curline << "'\n";

    namespace qi = boost::spirit::qi;

#if 0 // THIS IS UNDEFINED BEHAVIOUR:
    auto ids     = -qi::ulong_long % ','; // '-' allows for empty vecs.
    auto grammar = ids % ';';
#else // THIS IS CORRECT:
    auto ids     = qi::copy(-qi::ulong_long % ','); // '-' allows for empty vecs.
    auto grammar = qi::copy(ids % ';');
#endif

    std::vector<std::vector<ull> > r;
    qi::parse(curline.begin(), curline.end(), grammar, r);

    std::cout << "got:      ";
    for (auto v: r){
        for (auto i : v)
            std::cout << i << ",";
        std::cout << ";";
    }
    std::cout <<"\n";
}

Printing (also with GCC -O2!):

parsing: '1;2;;3,4;5'
got:      1,;2,;;3,4,;5,;

¹ (that's basically "at the next semicolon" here; but in standardese)