I have a small hierarchy of objects that I need to serialize and transmit via a socket connection. I need to both serialize the object, then deserialize it based on what type it is. Is there an easy way to do this in C++ (as there is in Java)?
Are there any C++ serialization online code samples or tutorials?
EDIT: Just to be clear, I\'m looking for methods on converting an object into an array of bytes, then back into an object. I can handle the socket transmission.
Talking about serialization, the boost serialization API comes to my mind. As for transmitting the serialized data over the net, I\'d either use Berkeley sockets or the asio library.
Edit:
If you want to serialize your objects to a byte array, you can use the boost serializer in the following way (taken from the tutorial site):
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
class gps_position
{
private:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & degrees;
ar & minutes;
ar & seconds;
}
int degrees;
int minutes;
float seconds;
public:
gps_position(){};
gps_position(int d, int m, float s) :
degrees(d), minutes(m), seconds(s)
{}
};
Actual serialization is then pretty easy:
#include <fstream>
std::ofstream ofs(\"filename.dat\", std::ios::binary);
// create class instance
const gps_position g(35, 59, 24.567f);
// save data to archive
{
boost::archive::binary_oarchive oa(ofs);
// write class instance to archive
oa << g;
// archive and stream closed when destructors are called
}
Deserialization works in an analogous manner.
There are also mechanisms which let you handle serialization of pointers (complex data structures like tress etc are no problem), derived classes and you can choose between binary and text serialization. Besides all STL containers are supported out of the box.
In some cases, when dealing with simple types, you can do:
object o;
socket.write(&o, sizeof(o));
That\'s ok as a proof-of-concept or first-draft, so other members of your team can keep working on other parts.
But sooner or later, usually sooner, this will get you hurt!
You run into issues with:
- Virtual pointer tables will be corrupted.
- Pointers (to data/members/functions) will be corrupted.
- Differences in padding/alignment on different machines.
- Big/Little-Endian byte ordering issues.
- Variations in the implementation of float/double.
(Plus you need to know what you are unpacking into on the receiving side.)
You can improve upon this by developing your own marshalling/unmarshalling methods for every class. (Ideally virtual, so they can be extended in subclasses.) A few simple macros will let you to write out different basic types quite quickly in a big/little-endian-neutral order.
But that sort of grunt work is much better, and more easily, handled via boost\'s serialization library.
Serialization means turning your object into binary data. While deserialization means recreating an object from the data.
When serializing you are pushing bytes into an uint8_t
vector.
When unserializing you are reading bytes from an uint8_t
vector.
There are certainly patterns you can employ when serializing stuff.
Each serializable class should have a serialize(std::vector<uint8_t> &binaryData)
or similar signatured function that will write its binary representation into the provided vector. Then this function may pass this vector down to it\'s member\'s serializing functions so they can write their stuff into it too.
Since the data representation can be different on different architectures.
You need to find out a scheme how to represent the data.
Let\'s start from the basics:
Serializing integer data
Just write the bytes in little endian order. Or use varint representation if size matters.
Serialization in little endian order:
data.push_back(integer32 & 0xFF);
data.push_back((integer32 >> 8) & 0xFF);
data.push_back((integer32 >> 16) & 0xFF);
data.push_back((integer32 >> 24) & 0xFF);
Deserialization from little endian order:
integer32 = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
Serializing floating point data
As far as I know the IEEE 754 has a monopoly here. I don\'t know of any mainstream architecture that would use something else for floats. The only thing that can be different is the byte order. Some architectures use little endian, others use big endian byte order. This means you need to be careful which order to you loud up the bytes on the receiving end. Another difference can be handling of the denormal and infinity and NAN values. But as long as you avoid these values you should be OK.
Serialization:
uint8_t mem[8];
memcpy(mem, doubleValue, 8);
data.push_back(mem[0]);
data.push_back(mem[1]);
...
Deserialization is doing it backward. Mind the byte order of your architecture!
Serializing strings
First you need to agree on an encoding. UTF-8 is common. Then store it as a length prefixed manner: first you store the length of the string using a method I mentioned above, then write the string byte-by-byte.
Serializing arrays.
They are the same as a strings. You first serialize an integer representing the size of the array then serialize each object in it.
Serializing whole objects
As I said before they should have a serialize
method that add content to a vector.
To unserialize an object, it should have a constructor that takes byte stream. It can be an istream
but in the simplest case it can be just a reference uint8_t
pointer. The constructor reads the bytes it wants from the stream and sets up the fields in the object.
If the system is well designed and serialize the fields in object field order, you can just pass the stream to the field\'s constructors in an initializer list and have them deserialized in the right order.
Serializing object graphs
First you need to make sure if these objects are really something you want to serialize. You don\'t need to serialize them if instances of these objects present on the destination.
Now you found out you need to serialize that object pointed by a pointer.
The problem of pointers that they are valid only the in the program that uses them. You cannot serialize pointer, you should stop using them in objects. Instead create object pools.
This object pool is basically a dynamic array which contains \"boxes\". These boxes have a reference count. Non-zero reference count indicates a live object, zero indicates an empty slot. Then you create smart pointer akin to the shared_ptr that doesn\'t store the pointer to the object, but the index in the array. You also need to agree on an index that denotes the null pointer, eg. -1.
Basically what we did here is replaced the pointers with array indexes.
Now when serializing you can serialize this array index as usual. You don\'t need to worry about where does the object will be in memory on the destination system. Just make sure they have the same object pool too.
So we need to serialize the object pools. But which ones? Well when you serialize an object graph you are not serializing just an object, you are serializing an entire system. This means the serialization of the system shouldn\'t start from parts of the system. Those objects shouldn\'t worry about the rest of the system, they only need to serialize the array indexes and that\'s it. You should have a system serializer routine that orchestrates the serialization of the system and walks through the relevant object pools and serialize all of them.
On the receiving end all the arrays an the objects within are deserialized, recreating the desired object graph.
Serializing function pointers
Don\'t store pointers in the object. Have a static array which contains the pointers to these functions and store the index in the object.
Since both programs have this table compiled into themshelves, using just the index should work.
Serializing polymorphic types
Since I said you should avoid pointers in serializable types and you should use array indexes instead, polymorphism just cannot work, because it requires pointers.
You need to work this around with type tags and unions.
Versioning
On top of all the above. You might want different versions of the software interoperate.
In this case each object should write a version number at the beginning of their serialization to indicate version.
When loading up the object at the other side the, newer objects maybe able to handle the older representations but the older ones cannot handle the newer so they should throw an exception about this.
Each time a something changes, you should bump the version number.
So to wrap this up, serialization can be complex. But fortunately you don\'t need to serialize everything in your program, most often only the protocol messages are serialized, which are often plain old structs. So you don\'t need the complex tricks I mentioned above too often.