I've been experimenting with Qi to parse a simple, new-line delimited
file of vertices. In the following format (expressed in my made-up easy to read notation):
double double double optional(either (int int int optional(int)) or (double double double optional(double)))
My test-cases start failing with repeat
and I can't find the error. The comments in the code are hopefully more enlightening:
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <iostream>
using namespace boost::spirit;
qi::rule<std::string::iterator, ascii::space_type> vertexRule =
(double_ >> double_ >> double_);
qi::rule<std::string::iterator, ascii::space_type> colorRule =
(double_ >> double_ >> double_ >> -(double_)) | (uint_ >> uint_ >> uint_ >> -(uint_));
template<typename Iterator, typename Rule>
bool parseIt(Iterator begin, Iterator end, Rule rule) {
bool r = qi::phrase_parse(
begin, end,
rule,
ascii::space
);
if(begin != end) {
std::cout << "No full match!" << std::endl;
while(begin != end)
std::cout << *begin++;
return false;
}
return r;
}
int main()
{
qi::rule<std::string::iterator, ascii::space_type> rule1 =
repeat(3)[vertexRule >> -(colorRule)];
std::string t1{
"20.0 20.0 20.0\n"
"1.0 1.0 1.0 255 0 255 23\n"
"1.0 1.0 1.0 1.0 0.3 0.2 0.3\n"
};
std::cout << std::boolalpha;
// matches
std::cout << parseIt(t1.begin(), t1.end(), rule1) << std::endl;
// 3 double 3 ints
std::string test{"1.0 1.0 1.0 1 3 2\n"};
// matches individually
std::cout << parseIt(test.begin(), test.end(), vertexRule >> -(colorRule)) << std::endl;
// offending line added at the end
// but position does not matter
// also offending 3 double 3 double
std::string t2{
"20.0 20.0 20.0\n"
"1.0 1.0 1.0 255 0 255 23\n"
"1.0 1.0 1.0 1.0 0.3 0.2 0.3\n"
"1.0 1.0 1.0 1 3 2\n"
};
qi::rule<std::string::iterator, ascii::space_type> rule2 =
repeat(4)[vertexRule >> -(colorRule)];
// does not match
std::cout << parseIt(t2.begin(), t2.end(), rule2) << std::endl;
// interestingly this matches
// std::string t2{
// "1.0 1.0 1.0 1 3 2\n"
// "1.0 1.0 1.0 1 3 2\n"
// "1.0 1.0 1.0 1 3 2\n"
// "1.0 1.0 1.0 1 3 2\n"
// };
}
I'm new to parser construction and especially Boost.Spirit. So comments pointing out the obvious are also appreciated.
Your prose description and sample inputs seem to indicate line-ends have significance to your grammar.
Yet, I cannot find any evidence of the fact that you tried to express that in your rules.
There is one other issue with the ambiguity between double_
and uint_
(see below).
Here is a reworked sample that adds a custom skipper (that will no eat the eol
). Also, I made it accept any number of trailing eol
, but nothing else:
skipper = qi::char_(" \t");
bool r = qi::phrase_parse(
begin, end,
(vertexRule >> -colorRule) % qi::eol >> *qi::eol >> qi::eoi,
skipper
);
Full code returning success for all parses:
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <iostream>
using namespace boost::spirit;
template<typename Iterator>
bool parseIt(Iterator begin, Iterator end)
{
qi::rule<Iterator, qi::blank_type> vertexRule, colorRule;
vertexRule = double_ >> double_ >> double_;
colorRule = (double_ >> double_ >> double_ >> -(double_)) | (uint_ >> uint_ >> uint_ >> -(uint_));
bool r = qi::phrase_parse(
begin, end,
(vertexRule >> -colorRule) % qi::eol >> *qi::eol >> qi::eoi,
qi::blank
);
if(begin != end)
{
std::cout << "No full match! '" << std::string(begin, end) << std::endl;
return false;
}
return r;
}
int main()
{
std::string t1
{
"20.0 20.0 20.0\n"
"1.0 1.0 1.0 255 0 255 23\n"
"1.0 1.0 1.0 1.0 0.3 0.2 0.3\n"
};
std::cout << std::boolalpha;
// matches
std::cout << parseIt(t1.begin(), t1.end()) << std::endl;
// 3 double 3 ints
std::string test {"1.0 1.0 1.0 1 3 2\n"};
// matches individually
std::cout << parseIt(test.begin(), test.end()) << std::endl;
// offending line added at the end
// but position does not matter
// also offending 3 double 3 double
std::string t2
{
"20.0 20.0 20.0\n"
"1.0 1.0 1.0 255 0 255 23\n"
"1.0 1.0 1.0 1.0 0.3 0.2 0.3\n"
"1.0 1.0 1.0 1 3 2\n"
};
// does not match
std::cout << parseIt(t2.begin(), t2.end()) << std::endl;
// interestingly this matches
// std::string t2{
// "1.0 1.0 1.0 1 3 2\n"
// "1.0 1.0 1.0 1 3 2\n"
// "1.0 1.0 1.0 1 3 2\n"
// "1.0 1.0 1.0 1 3 2\n"
// };
}
uint_
versus double_
As mentioned, there is also an ambiguity lurking here:
colorRule = (double_ >> double_ >> double_ >> -(double_)) | (uint_ >> uint_ >> uint_ >> -(uint_));
As it stands, the (uint_ >> uint_ >> uint_ >> -(uint_)
part of the rule will never be matched, as it would also match the first part (with double_
). I'd simply rewrite this as
colorRule = double_ >> double_ >> double_ >> -double_;
Unless of course the meaning of the values changes if they are specified as floats (e.g. uints go from 0..255, but doubles go from 0.0..1.0). In that case I can see why you would want to detect integer-ness. You can achieve that by reordering.
colorRule = (uint_ >> uint_ >> uint_ >> -(uint_))
| (double_ >> double_ >> double_ >> -(double_));
To make things easier on the user of the parser, I'd simply expose the same attribute type at all times, and consider a semantic action to convert the integers to doubles using whatever conversion appropriate:
#include <boost/spirit/include/phoenix_operator.hpp>
// ....
qi::rule<Iterator, Skipper, double()> colorInt = uint_ [ _val = _1 / 255.0 ];
colorRule = (colorInt >> colorInt >> colorInt >> -(colorInt))
| (double_ >> double_ >> double_ >> -(double_));