C++ Serialization Performance

2019-01-10 18:05发布

问题:

I'm building a distributed C++ application that needs to do lots of serialization and deserialization of simple data structures that's being passed between different processes and computers.

I'm not interested in serializing complex class hierarchies, but more of sending structures with a few simple members such as number, strings and data vectors. The data vectors can often be many megabytes large. I'm worried that text/xml-based ways of doing it is too slow and I really don't want to write this myself since problems like string encoding and number endianess can make it way more complicated than it looks on the surface.

I've been looking a bit at protocol buffers and boost.serialize. According to the documents protocol buffers seems to care much about performance. Boost seems somewhat more lightweight in the sense that you don't have an external language for specifying the data format which I find quite convenient for this particular project.

So my question comes down to this: does anyone know if the boost serialization is fast for the typical use case I described above?

Also if there are other libraries that might be right for this, I'd be happy to hear about them.

回答1:

I would strongly suggest protocol buffers. They're incredibly simple to use, offer great performance, and take care of issues like endianness and backwards compatibility. To make it even more attractive, serialized data is language-independent thanks to numerous language implementations.



回答2:

ACE and ACE TAO come to mind, but you might not like the size and scope of it. http://www.cs.wustl.edu/~schmidt/ACE.html

Regarding your query about "fast" and boost. That is a subjective term and without knowing your requirements (throughput, etc) it is difficult to answer that for you. Not that I have any benchmarks for the boost stuff myself...

There are messaging layers you can use, but those are probably slower than boost. I'd say that you identified a good solution in boost, but I've only used ACE and other proprietary communications/messaging products.



回答3:

My guess is that boost is fast enough. I have used it in previous projects to serialize data to and from disk, and its performance never even came up as an issue.

My answer here talks about serialization in general, which may be helpful to you beyond which serialization library you choose to use.

Having said that, it looks like you know most of the main trouble spots with serialization (endianess string encoding). You did leave out versioning and forwards/backwards compatibility. If time is not critical I recommend writing your own serialization code. It is an enlightening experience, and the lessons you learn are invaluable. Though I will warn you it will tend to make you hate XML based protocols for their bloatedness. :)

Whichever path you choose good luck with your project.



回答4:

Also check out ONC-RPC (old SUN-RPC)



回答5:

boost.serialization doesn't care about string encodings or endianness. You'll be similarly well off not using it if that matters to you.

You might want to look into ICE from ZeroC: http://www.zeroc.com/

It works similar to CORBA, except that it's entirely specced and defined by the company. The upside is that the implementations work as intended, since there aren't all that many. The downside is that if you're using a language they don't support, you're out of luck.



回答6:

If you are only sending well defined defined data structures, then perhaps you should be looking at ASN.1 as an encoding methodology ?



回答7:

There's also Thrift, which looks like an alpha project but is used and developed by Facebook, so it has a few users of it.

Or good old DCE, which was the standard MS decided to use for COM. Its now open-source, 20 years too late, but better than never.



回答8:

Don't pre-emptively optimize. Measure first and optimize second.