I am trying to binary serialize the data of vector. In this sample below I serialize to a string, and then deserialize back to a vector, but do not get the same data I started with. Why is this the case?
vector<size_t> v;
v.push_back(1);
v.push_back(2);
v.push_back(3);
string s((char*)(&v[0]), 3 * sizeof(size_t));
vector<size_t> w(3);
strncpy((char*)(&w[0]), s.c_str(), 3 * sizeof(size_t));
for (size_t i = 0; i < w.size(); ++i) {
cout << w[i] << endl;
}
I expect to get the output
1
2
3
but instead get the output
1
0
0
(on gcc-4.5.1)
The error is in the call to strncpy
. From the linked page:
If the length of src is less than n, strncpy() pads the remainder of dest with null bytes.
So, after the first 0
byte in the serialized data is found the remainder of w
's data array is padded with 0
s.
To fix this, use a for
loop, or std::copy
std::copy( &s[0],
&s[0] + v.size() * sizeof(size_t),
reinterpret_cast<char *>(w.data()) );
IMO, instead of using std::string
as a buffer, just use a char
array to hold the serialized data.
Example on ideone
strncpy
is a giant pile of fail. It will terminate early on your input because the size_t
have some zero bytes, which it interprets as the NULL terminator, leaving them as default-constructed 0. If you ran this test on a BE machine, all would be 0. Use std::copy
.
To serialize this vector into a string, You first want to convert each of the elements of of this vector from an int into a string containing the same the ascii representation of that number, this operation can be called serialization of an int to string.
So for example, assuming an integer is 10 digits we can
// create temporary string to hold each element
char intAsString[10 + 1];
then convert the integer to a string
sprintf(intAsString, "%d", v[0]);
or
itoa( v[0], intAsString, 10 /*decimal number*/ );
You can also make use of the ostringstream and the << operator
if you look at the memory contents of intAsString and v[0], they are very different, the first contains the ascii letters that represent the value of v[0] in the decimal number system(base 10) while v[0] contains the binary representation of the number(because that's how computers store numbers).
The safest way would be to just loop through the vector and store the values individually into a char array of size 3*sizeof(size_t). That way you don't have a dependency on the internal structure of the vector class implementation.