I am trying to reduce the memory size of boost archives in C++.
One problem I have found is that Boost's binary archives default to using 4 bytes for any int, regardless of its magnitude. For this reason, I am getting that an empty boost binary archive takes 62 bytes while an empty text archive takes 40 (text representation of an empty text archive: 22 serialization::archive 14 0 0 1 0 0 0 0 0
).
Is there any way to change this default behavior for ints?
Else, are there any other ways to optimize the size of a binary archive apart from using make_array for vectors?
As Alexey says, within Boost you'd have to use smaller member variables. The only serialisations that do something better are, AFAIK, Google Protocol Buffers and ASN.1 PER.
GPB uses variable length integers to use a number of bytes appropriate to the value being transferred.
ASN.1 PER goes about it a different way; in an ASN.1 scheme you can define the valid range of values. Thus if you declare an int field to have a valid range between 0 and 15, it will use only 4 bits. uPER goes further; it doesn't align the bits for fields to byte boundaries, saving more bits. uPER is what 3G, 4G use over the radio link, saves a lot of bandwidth.
So far as I know most other endeavours involve post serialisation compression with ZIP or similar. Fine for large amounts of data, rubbish otherwise.
See Boost C++ Serialization overhead
That's because it's a serialization library, not a compression library
Use the archive flags: e.g. from Boost Serialization : How To Predict The Size Of The Serialized Result?:
No. There is
BOOST_IS_BITWISE_SERIALIZABLE(T)
though (see e.g. Boost serialization bitwise serializability for an example and explanations).Using
make_array
doesn't help forvector<int>
:Live On Coliru
Prints
Compression
The most straightforward way to optimize is to compress the resulting stream (see also the benchmarks added here).
Barring that, you will have to override default serialization and apply your own compression (which could be a simple run-length encoding, huffman coding or something more domain specific).
Demo
Live On Coliru
Prints
That's a compression ratio of ~11% (or ~19% if you drop the archive flags).