Protocol buffers issue, multiple serializations in

2019-06-26 03:14发布

问题:

I am getting some weird behaviour from protobuf binary file io. I am pre-processing a text corpus into a protobuf intermediary file. My serialization class looks as follows:

  class pb_session_printer
  {
  public:
    pb_session_printer(std::string & filename)
      : out(filename.c_str(), std::fstream::out | std::fstream::trunc | 
                              std::fstream|binary)
      {}

    void print_batch(std::vector<session> & pb_sv)
    {
      boost::lock_guard<boost::mutex> lock(m);

      BOOST_FOREACH(session & s, pb_sv)
      {
        std::cout << out.tellg() << ":";
        s.SerializeToOstream(&out);
        out.flush();
        std::cout << s.session_id() << ":" << s.action_size() << std::endl;
      }
      exit(0);
    }

    std::fstream out;
    boost::mutex m;
  };

A snippet of output looks like :

0:0:8
132:1:8
227:2:6
303:3:6
381:4:19
849:5:9
1028:6:2
1048:7:18
1333:8:28
2473:9:24

The first field shows that serialization is proceeding as normal.

When I run my loading program :

int main()
{
  std::fstream in_file("out_file", std::fstream::in | std::ios::binary);
  session s;

  std::cout << in_file.tellg() << std::endl;
  s.ParseFromIstream(&in_file);
  std::cout << in_file.tellg() << std::endl;
  std::cout << s.session_id() << std::endl;

  s.ParseFromIstream(&in_file);
}

I get:

0
-1
111
libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type 
"session" because it is missing required fields: session_id

session_id : 111 is an entry towards the end of the stream, I clearly don't understand the semantics of binary-io facilities of the library. Please help.

回答1:

If you write multiple protobuffers in a single file you will need to write the size of the protobuf + protobuffer and read them in seperately (so without ParseFromIstream as Cat Plus Plus mentioned). When you have read in the protobuffer you can parse it with ParseFromArray.

Your file would look size this (the spaces are just for readability):

size protobuf size protobuf size protobuf etc.



回答2:

Message::ParseFromIstream is documented to consume the entire input. Since you're serialising a sequence of messages of the same type, you can just create a new message with repeated field of that type, and work with that.