I need to parse a line containing an unsigned int, the character X
that is to be discarded, and a string, all separated by one or more spaces. e.g., 1234 X abcd
bool a = qi::phrase_parse(first, last,
uint_[ref(num) = _1] >> lit('X') >> lexeme[+(char_ - ' ')],
space, parsed_str);
The above code parses the three parts, but the string ends up containing a junk character (�abcd
) and having a size of 5 and not 4.
What is wrong with my parser? and why is there junk in the string?
What you probably haven't realized, is that parser expressions stop having automatic attribute propagation in the presence of semantic actions*.
* Documentation backgound: How Do Rules Propagate Their Attributes?
You're using a semantic action to 'manually' propagate the attribute of the uint_
parser:
[ref(num) = _1] // this is a Semantic Action
So the easiest way to fix this, would be to propagate num
automatically too (the way the qi::parse
and qi::phrase_parse
APIs were intended):
bool ok = qi::phrase_parse(first, last, // input iterators
uint_ >> lit('X') >> lexeme[+(char_ - ' ')], // parser expr
space, // skipper
num, parsed_str); // output attributes
Or, addressing some off-topic points, even cleaner:
bool ok = qi::phrase_parse(first, last,
uint_ >> 'X' >> lexeme[+graph],
blank,
num, parsed_str);
As you can see, you can pass multiple lvalues as output attribute recipients.1, 2
See it a live demo on Coliru (link)
There's a whole lot of magic going on, which in practice leads to my rule of thumb:
Avoid using semantic actions in Spirit Qi expressions unless you absolutely have to
I have about this before, in an answer specificly about this: Boost Spirit: "Semantic actions are evil"?
In my experience, it's almost always cleaner to use the Attribute Customization Points to tweak the automatic propagation than to abandon auto rules and resort to manual attribute handling.
1 What technically happens to propagate these attributes, is that num
and parsed_str
will be 'tied' to the whole parse expression as a Fusion sequence:
fusion::vector2<unsigned&, std::string&>
and the exposed attribute of the rule:
fusion::vector2<unsigned, std::vector<char> >
will be 'transformed' to that during assignment. The attribute compatibility rules allow this conversion, and many others.
2 Alternatively, use semantic actions for both:
bool ok = qi::phrase_parse(first, last,
(uint_ >> 'X' >> as_string [ lexeme[+graph] ])
[ phx::ref(num) = _1, phx::ref(parsed_str) = _2 ],
blank);
There's a few subtleties here:
we need as_string
here to expose the attribute as std::string
instead of std::vector<char>
(see above)
we need to qualify phx::ref(parsed_str)
since even using boost::phoenix::ref
will not be enough to disambiguate std::ref
and phx::ref
: ADL will drag in std::ref
since it is from the same namespace as the type of parsed_str
.
group the semantic action to prevent partially assigned results, e.g. the following would overwrite num
even though X
may be missing in the input:
bool ok = qi::phrase_parse(first, last,
uint_ [ phx::ref(num) = _1 ]
>> 'X'
>> as_string [ lexeme[+graph] ] [ phx::ref(parsed_str) = _1 ],
blank);
All of this complexity can be hidden from your view if you avoid manual attribute propagation!