Google Protocol Buffer serialized string can conta

2019-08-12 07:40发布

问题:

I am using Google Protocol Buffer for message serialization. This is my sample proto file content.

package MessageParam;

message Sample
{
    message WordRec
    {
        optional uint64 id = 1; 
        optional string word = 2;
        optional double value = 3;
    }
    message WordSequence
    {
        repeated WordRec WordSeq = 1;
    }
}

I am trying to serialize the message in C++ like following

MessageParam::Sample::WordSequence wordseq;
for(int i =0;i<10;i++)
{
    AddRecords(wordseq.add_wordseq());
}
std::string str = wordseq.SerializeAsString();

After executing the above statement, the size of the str is 430. It is having embedded null characters in it. While I am trying to assign this str to std::wstring, std::wstring is terminating when it finds first null character.

void AddRecords(MessageParam::Sample::WordRec* wordrec)
{
    int id;
    cin>>id;
    wordrec->set_id(id);
    getline(cin, *wordrec->mutable_word());
    long value;
    cin>>value;
    wordrec->set_value(value);
}

Value of wordseq.DebugString() is WordSeq { id: 4 word: "software" value: 1 } WordSeq { id: 19 word: "technical" value: 0.70992374420166016 } WordSeq { id: 51 word: "hardware" value: 0.626017153263092 } How can I serialize "wordseq" as string which contains embedded NULL characters ?

回答1:

You should not try to store a Protobuf in a wstring. wstring is for storing unicode text, but a protobuf is not unicode text nor any other kind of text, it is raw bytes. You should keep in in byte form. If you really need to store a Protobuf in a textual context, you should base64-encode it first.

Arguably Protobufs' use of std::string to store bytes (rather than text) is confusing. Perhaps it should have used std::vector<unsigned char> all along. You should treat protobufs' std::strings like you would std::vector<unsigned char>.