I have done some performance comparison between several serialization protocols, including FlatBuffers, Cap'n Proto, Boost serialization and cereal. All the tests are written in C++.
I know that FlatBuffers and Cap'n Proto use zero-copy. With zero-copy, serialization time is null but size of serialized objects is bigger.
I thought that cereal and Boost serialization didn't use zero-copy. However, serialization time (for int and double) is nearly null, and size of serialized objects is nearly the same as Cap'n Proto or Flatbuffers ones. I didn't find any information about zero-copy in their documentations.
Do cereal and Boost serialization use zero-copy too ?
Boost and Cereal do not implement zero-copy in the sense of Cap'n Proto or Flatbuffers.
With true zero-copy serialization, the backing store for your live in-memory objects is in fact exactly the same memory segment that is passed to the read()
or write()
system calls. There is no packing/unpacking step at all.
Generally, this has a number of implications:
- Objects are not allocated using new/delete. When constructing a message, you allocate the message first, which allocates a long contiguous memory space for the message contents. You then allocate the message structure directly inside the message, receiving pointers that in fact point into the message's memory. When the message is later written, a single
write()
call shoves this whole memory space out to the wire.
- Similarly, when you read in a message, a single
read()
call (or maybe 2-3) reads in the entire message into one block of memory. You then get a pointer (or, a pointer-like object) to the "root" of the message, which you can use to traverse it. Note that no part of the message is actually inspected until your application traverses it.
- With normal sockets, the only copies of your data happen in kernel space. With RDMA networking, you may even be able to avoid kernel-space copies: the data comes off the wire directly into its final memory location.
- When working with files (rather than networks) it's possible to
mmap()
a very large message directly from disk and use the mapped memory region directly. Doing so is O(1) -- it doesn't matter how big the file is. Your operating system will automatically page in the necessary parts of the file when you actually access them.
- Two processes on the same machine can communicate through shared memory segments with no copies. Note that, generally, regular old C++ objects do not work well in shared memory, because the memory segments usually don't have the same address in both memory spaces, thus all the pointers are wrong. With a zero-copy serialization framework, the pointers are usually expressed as offsets rather than absolute addresses, so that they are position-independent.
Boost and Cereal are different: When you receive a message in these systems, first a pass is performed over the entire message to "unpack" the contents. The final resting place of the data is in objects allocated in the traditional way using new/delete. Similarly, when sending a message, the data has to be collected from this tree of objects and packed together into one buffer in order to be written out. Even though Boost and Cereal are "extensible", being truly zero-copy requires a very different underlying design; it cannot be bolted-in as an extension.
That said, don't assume zero-copy will always be faster. memcpy()
can be pretty fast, and the rest of your program may dwarf the cost. Meanwhile, zero-copy systems tend to have inconvenient APIs, particularly because of the restrictions on memory allocation. It may be overall a better use of your time to use a traditional serialization system.
The place where zero-copy is most obviously advantageous is when manipulating files, since as I mentioned you can easily mmap()
a huge file and only read part of it. Non-zero-copy formats simply can't do that. When it comes to networking, though, the advantages are less clear, since the network communication itself is necessarily O(n).
At the end of the day, if you really want to know which serialization system is fastest for your use case, you will probably need to try them all and measure them. Note that toy benchmarks are usually misleading; you need to test your actual use case (or something very similar) to get useful information.
Disclosure: I am the author of Cap'n Proto (a zero-copy serializer) and Protocol Buffers v2 (a popular non-zero-copy serializer).
Note: I bountied the other answer which understood the full scope of the question better
Boost Serialization is extensible.
It allows your types to describe what needs to be serialized, and the archives to describe the format.
This can be "zero-copy" - i.e. the only buffering is in the stream that receives your data (e.g. the socket or file descriptor).
For an example of a consciously zero-copy implementation of serialization for dynamic_bitset see the code in this answer: How to serialize boost::dynamic_bitset?
I have a number of these on the site. Also look at the documentation for BOOST_IS_BITWISE_SERIALIZABLE
and the the effect it has on container serialization (if you serialize a contiguously allocated collection of bitwise-serializable data, the upshot is zero-copy or even __memcpy_sse4
etc.).
Side-note: Cap'n proto does something else entirely, AFAIK: it marshals some objects as futures-to-the-data. This is apparently what they advertise aggressively as "∞% faster, 0µs!!!" (which is somewhat true in the case where the data is never retrieved).