I am getting some weird behaviour from protobuf binary file io. I am pre-processing a text corpus into a protobuf intermediary file. My serialization class looks as follows:
class pb_session_printer
{
public:
pb_session_printer(std::string & filename)
: out(filename.c_str(), std::fstream::out | std::fstream::trunc |
std::fstream|binary)
{}
void print_batch(std::vector<session> & pb_sv)
{
boost::lock_guard<boost::mutex> lock(m);
BOOST_FOREACH(session & s, pb_sv)
{
std::cout << out.tellg() << ":";
s.SerializeToOstream(&out);
out.flush();
std::cout << s.session_id() << ":" << s.action_size() << std::endl;
}
exit(0);
}
std::fstream out;
boost::mutex m;
};
A snippet of output looks like :
0:0:8
132:1:8
227:2:6
303:3:6
381:4:19
849:5:9
1028:6:2
1048:7:18
1333:8:28
2473:9:24
The first field shows that serialization is proceeding as normal.
When I run my loading program :
int main()
{
std::fstream in_file("out_file", std::fstream::in | std::ios::binary);
session s;
std::cout << in_file.tellg() << std::endl;
s.ParseFromIstream(&in_file);
std::cout << in_file.tellg() << std::endl;
std::cout << s.session_id() << std::endl;
s.ParseFromIstream(&in_file);
}
I get:
0
-1
111
libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type
"session" because it is missing required fields: session_id
session_id : 111 is an entry towards the end of the stream, I clearly don't understand the semantics of binary-io facilities of the library. Please help.
Message::ParseFromIstream
is documented to consume the entire input. Since you're serialising a sequence of messages of the same type, you can just create a new message withrepeated
field of that type, and work with that.If you write multiple protobuffers in a single file you will need to write the size of the protobuf + protobuffer and read them in seperately (so without
ParseFromIstream
as Cat Plus Plus mentioned). When you have read in the protobuffer you can parse it withParseFromArray
.Your file would look size this (the spaces are just for readability):
size protobuf size protobuf size protobuf etc.