Is Boost skip parser the right approach?

2019-08-05 17:43发布

问题:

After some delay I'm now again trying to parse some ASCII text file surrounded by some binary characters.

Parsing text file with binary envelope using boost Spririt

However I'm now struggling if a skip parser is the right approach?

The grammar of the file (it's a JEDEC file) is quite simple:

Each data field in the file starts with a single letter and ends with an asterisk. The data field can contain spaces and carriage return. After the asterisk spaces and carriage return might follow too before the next field identifier.

This is what I used to start building a parser for such a file:

phrase_parse(first, last, 
             // First char in File
             char_('\x02') >>

             // Data field
             *((print[cout << _1] | graph[cout << _1]) - char_('*')) >>

             // End of data followed by 4 digit hexnumber. How to limit?
             char_('\x03') >> *xdigit,

             // Skip asterisks
             char_('*') );

Unfortunately I don't get any output from this one. Does someone have an idea what might be wrong?

Sample file:

<STX>
JEDEC file generated by John Doe*
DM SIGNETICS(PHILIPS)*
DD GAL16R8*
QP20*
QV0*
G0*F0*
L00000 1110101111100110111101101110111100111111*
CDEAD*
<ETX>BEEF

and this is what I want to achive:

Start: JEDEC file generated by John Doe
D: M SIGNETICS(PHILIPS)
D: D GAL16R8
Q: P20
Q: V0
G: 0
F: 0
L: 00000 1110101111100110111101101110111100111111
C: DEAD
End: BEEF

回答1:

I would suggest you want to use a skipper at the toplevel rule only. And use it to skip the insignificant whitespace.

You don't use a skipper for the asterisks because you do not want to ignore them. If they're ignored, your rules cannot act upon them.

Furthermore the inner rules should not use the space skipper for the simple reason that whitespace and linefeeds are valid field data in JEDEC.

So, the upshot of all this would be:

value = *(ascii::char_("\x20-\x7e\r\n") - '*') >> '*';
field = ascii::graph >> value;
start = STX >> value >> *field >> ETX >> xmit_checksum; 

Where the rules would be declared with the respective skippers:

qi::uint_parser<uint16_t, 16, 4, 4>           xmit_checksum;
qi::rule<It, ascii::space_type> start;
qi::rule<It>             field, value; // no skippers - they are lexemes

Take-away: Split your grammar up in rules. Be happier for it.

Processing the results

Your sample needlessly mixes responsibilities for parsing and "printing". I'd suggest not using semantic actions here (Boost Spirit: "Semantic actions are evil"?).

Instead, declare appropriate attribute types:

struct JEDEC {
    std::string caption;
    struct field { 
        char id;
        std::string value;
    };
    std::vector<field> fields;
    uint16_t checksum;
};

And declare them in your rules:

qi::rule<It, ast::JEDEC(), ascii::space_type> start;
qi::rule<It, ast::JEDEC::field()>             field;
qi::rule<It, std::string()>                   value;
qi::uint_parser<uint16_t, 16, 4, 4>           xmit_checksum;

Now, nothing needs to be changed in your grammar, and you can print the desired output with:

inline static std::ostream& operator<<(std::ostream& os, JEDEC const& jedec) {
    os << "Start: " << jedec.caption << "\n";
    for(auto& f : jedec.fields)
        os << f.id << ": " << f.value << "\n";

    auto saved = os.rdstate();
    os << "End: " << std::hex << std::setw(4) << std::setfill('0') << jedec.checksum;
    os.setstate(saved);

    return os;
}

LIVE DEMO

Here's a demo program that ties it together using the sample input from your question:

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>

namespace qi = boost::spirit::qi;
namespace ascii = qi::ascii;

namespace ast {
    struct JEDEC {
        std::string caption;
        struct field { 
            char id;
            std::string value;
        };
        std::vector<field> fields;
        uint16_t checksum;
    };

    inline static std::ostream& operator<<(std::ostream& os, JEDEC const& jedec) {
        os << "Start: " << jedec.caption << "\n";
        for(auto& f : jedec.fields)
            os << f.id << ": " << f.value << "\n";

        auto saved = os.rdstate();
        os << "End: " << std::hex << std::setw(4) << std::setfill('0') << std::uppercase << jedec.checksum;
        os.setstate(saved);

        return os;
    }
}

BOOST_FUSION_ADAPT_STRUCT(ast::JEDEC::field,
        (char, id)(std::string, value))
BOOST_FUSION_ADAPT_STRUCT(ast::JEDEC,
        (std::string, caption)
        (std::vector<ast::JEDEC::field>, fields)
        (uint16_t, checksum))

template <typename It> 
struct JedecGrammar : qi::grammar<It, ast::JEDEC(), ascii::space_type>
{
    JedecGrammar() : JedecGrammar::base_type(start) {
        const char STX = '\x02';
        const char ETX = '\x03';

        value = *(ascii::char_("\x20-\x7e\r\n") - '*') >> '*';
        field = ascii::graph >> value;
        start = STX >> value >> *field >> ETX >> xmit_checksum; 

        BOOST_SPIRIT_DEBUG_NODES((start)(field)(value))
    }
  private:
    qi::rule<It, ast::JEDEC(), ascii::space_type> start;
    qi::rule<It, ast::JEDEC::field()>             field;
    qi::rule<It, std::string()>                   value;
    qi::uint_parser<uint16_t, 16, 4, 4>           xmit_checksum;
};

int main() {
    typedef boost::spirit::istream_iterator It;
    It first(std::cin>>std::noskipws), last;

    JedecGrammar<It> g;

    ast::JEDEC jedec;
    bool ok = phrase_parse(first, last, g, ascii::space, jedec);

    if (ok)
    {
        std::cout << "Parse success\n";
        std::cout << jedec;
    }
    else
        std::cout << "Parse failed\n";

    if (first != last)
        std::cout << "Remaining input unparsed: '" << std::string(first, last) << "'\n";
}

Output:

Start: JEDEC file generated by John Doe
D: M SIGNETICS(PHILIPS)
D: D GAL16R8
Q: P20
Q: V0
G: 0
F: 0
L: 00000 1110101111100110111101101110111100111111
C: DEAD
End: BEEF

Take-away: See your dentist twice a year.