Boost.spirit: parsing number char and string

2019-07-09 20:33发布

问题:

I need to parse a line containing an unsigned int, the character X that is to be discarded, and a string, all separated by one or more spaces. e.g., 1234 X abcd

bool a = qi::phrase_parse(first, last,
      uint_[ref(num) = _1] >> lit('X') >> lexeme[+(char_ - ' ')],
      space, parsed_str);

The above code parses the three parts, but the string ends up containing a junk character (�abcd) and having a size of 5 and not 4.

What is wrong with my parser? and why is there junk in the string?

回答1:

What you probably haven't realized, is that parser expressions stop having automatic attribute propagation in the presence of semantic actions*.

* Documentation backgound: How Do Rules Propagate Their Attributes?

You're using a semantic action to 'manually' propagate the attribute of the uint_ parser:

[ref(num) = _1]   // this is a Semantic Action

So the easiest way to fix this, would be to propagate num automatically too (the way the qi::parse and qi::phrase_parse APIs were intended):

bool ok = qi::phrase_parse(first, last,               // input iterators
        uint_ >> lit('X') >> lexeme[+(char_ - ' ')],  // parser expr
        space,                                      // skipper
        num, parsed_str);                            // output attributes

Or, addressing some off-topic points, even cleaner:

bool ok = qi::phrase_parse(first, last,
        uint_ >> 'X' >> lexeme[+graph],
        blank, 
        num, parsed_str);

As you can see, you can pass multiple lvalues as output attribute recipients.1, 2

See it a live demo on Coliru (link)

There's a whole lot of magic going on, which in practice leads to my rule of thumb:

Avoid using semantic actions in Spirit Qi expressions unless you absolutely have to

I have about this before, in an answer specificly about this: Boost Spirit: "Semantic actions are evil"?

In my experience, it's almost always cleaner to use the Attribute Customization Points to tweak the automatic propagation than to abandon auto rules and resort to manual attribute handling.


1 What technically happens to propagate these attributes, is that num and parsed_str will be 'tied' to the whole parse expression as a Fusion sequence:

fusion::vector2<unsigned&, std::string&>

and the exposed attribute of the rule:

fusion::vector2<unsigned, std::vector<char> >

will be 'transformed' to that during assignment. The attribute compatibility rules allow this conversion, and many others.


2 Alternatively, use semantic actions for both:

bool ok = qi::phrase_parse(first, last,
        (uint_ >> 'X' >> as_string [ lexeme[+graph] ]) 
            [ phx::ref(num) = _1, phx::ref(parsed_str) = _2 ],
        blank);

There's a few subtleties here:

  • we need as_string here to expose the attribute as std::string instead of std::vector<char> (see above)

  • we need to qualify phx::ref(parsed_str) since even using boost::phoenix::ref will not be enough to disambiguate std::ref and phx::ref: ADL will drag in std::ref since it is from the same namespace as the type of parsed_str.

  • group the semantic action to prevent partially assigned results, e.g. the following would overwrite num even though X may be missing in the input:

    bool ok = qi::phrase_parse(first, last,
           uint_ [ phx::ref(num) = _1 ] 
        >> 'X' 
        >> as_string [ lexeme[+graph] ] [ phx::ref(parsed_str) = _1 ],
        blank);
    

All of this complexity can be hidden from your view if you avoid manual attribute propagation!