Parsing text file with binary envelope using boost

2020-04-18 05:50发布

问题:

I'm currently trying to write a parser for an ASCII text file that is surrounded by a small envelope with checksum.

The basic structure of the file is: <0x02><"File payload"><0x03><16bit CRC>

and I want to extract the payload in another string to feed it to the next parser.

The parser expression I use to parse this envelope is:

qi::phrase_parse(
    first, last,
    char_('\x02') >> *print >> char_('\x02') >> *xdigit,
    space
);

The input is consumed... and I already tried to dump out the payload:

qi::phrase_parse(
    first, last,
    char_('\x02') >> *print[cout << _1] >> char_('\x02') >> *xdigit,
    space
);

But problem is that every newline, blank etc. is omitted!

Now my questions:

  1. How do I extract the content between the 0x02/0x03 (ETX/STX) bytes correctly without omitting spaces, newlines etc.

  2. And is my approach to first remove the envelope and then parse the payload good or is there another better approach I should use?

回答1:

Use e.g. qi::seek/qi::confix to get you started (both part of the repository http://www.boost.org/doc/libs/1_57_0/libs/spirit/repository/doc/html/spirit_repository/qi_components/directives/confix.html).

But problem is that every newline, blank etc. is omitted!

Well, that's what a skipper does. Don't use one, or:

Use qi::raw[]

To extract the intervening text, I suggest using qi::raw. Although I'm not sure you actually want to copy it to a string (copying sounds expensive). You could do this probably when the source is a stream (or other source of input iterators).

Seminal rule:

myrule = '\x02' > raw [ *(char_ - '\x03') ] > '\x03';

You could add the checksumming:

myrule = '\x02' > raw [ *(char_ - '\x03') ] [ _a = _checksum(_1) ] > '\x03' >> qi::word(_a);

Assuming

  • qi::locals<uint16_t>
  • _checksum is a suitable Phoenix functor that takes a pair of source iterators and returns uint16_t

Of course you might prefer to keep checksumming outside the parser.