Boost binary archives - reducing size

2019-07-18 12:51发布

I am trying to reduce the memory size of boost archives in C++.

One problem I have found is that Boost's binary archives default to using 4 bytes for any int, regardless of its magnitude. For this reason, I am getting that an empty boost binary archive takes 62 bytes while an empty text archive takes 40 (text representation of an empty text archive: 22 serialization::archive 14 0 0 1 0 0 0 0 0).

Is there any way to change this default behavior for ints?

Else, are there any other ways to optimize the size of a binary archive apart from using make_array for vectors?

2条回答
我命由我不由天
2楼-- · 2019-07-18 13:38

As Alexey says, within Boost you'd have to use smaller member variables. The only serialisations that do something better are, AFAIK, Google Protocol Buffers and ASN.1 PER.

GPB uses variable length integers to use a number of bytes appropriate to the value being transferred.

ASN.1 PER goes about it a different way; in an ASN.1 scheme you can define the valid range of values. Thus if you declare an int field to have a valid range between 0 and 15, it will use only 4 bits. uPER goes further; it doesn't align the bits for fields to byte boundaries, saving more bits. uPER is what 3G, 4G use over the radio link, saves a lot of bandwidth.

So far as I know most other endeavours involve post serialisation compression with ZIP or similar. Fine for large amounts of data, rubbish otherwise.

查看更多
地球回转人心会变
3楼-- · 2019-07-18 13:43
  1. Q. I am trying to reduce the memory size of boost archives in C++.

    See Boost C++ Serialization overhead

  2. Q. One problem I have found is that Boost's binary archives default to using 4 bytes for any int, regardless of its magnitude.

    That's because it's a serialization library, not a compression library

  3. Q. For this reason, I am getting that an empty boost binary archive takes 62 bytes while an empty text archive takes 40 (text representation of an empty text archive: 22 serialization::archive 14 0 0 1 0 0 0 0 0).

    Use the archive flags: e.g. from Boost Serialization : How To Predict The Size Of The Serialized Result?:

    • Tune things (boost::archive::no_codecvt, boost::archive::no_header, disable tracking etc.)
  4. Q. Is there any way to change this default behavior for ints?

    No. There is BOOST_IS_BITWISE_SERIALIZABLE(T) though (see e.g. Boost serialization bitwise serializability for an example and explanations).

  5. Q. Else, are there any other ways to optimize the size of a binary archive apart from using make_array for vectors?

    Using make_array doesn't help for vector<int>:

    Live On Coliru

    #include <boost/archive/binary_oarchive.hpp>
    #include <boost/serialization/vector.hpp>
    #include <sstream>
    #include <iostream>
    
    static auto const flags = boost::archive::no_header | boost::archive::no_tracking;
    
    template <typename T>
    std::string direct(T const& v) {
        std::ostringstream oss;
        {
            boost::archive::binary_oarchive oa(oss, flags);
            oa << v;
        }
        return oss.str();
    }
    
    template <typename T>
    std::string as_pod_array(T const& v) {
        std::ostringstream oss;
        {
            boost::archive::binary_oarchive oa(oss, flags);
            oa << v.size() << boost::serialization::make_array(v.data(), v.size());
        }
        return oss.str();
    }
    
    int main() {
        std::vector<int> i(100);
        std::cout << "direct: "       << direct(i).size() << "\n";
        std::cout << "as_pod_array: " << as_pod_array(i).size() << "\n";
    }
    

    Prints

    direct: 408
    as_pod_array: 408
    

Compression

The most straightforward way to optimize is to compress the resulting stream (see also the benchmarks added here).

Barring that, you will have to override default serialization and apply your own compression (which could be a simple run-length encoding, huffman coding or something more domain specific).

Demo

Live On Coliru

#include <boost/archive/binary_oarchive.hpp>
#include <boost/serialization/vector.hpp>
#include <sstream>
#include <iostream>
#include <boost/iostreams/filter/bzip2.hpp>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/device/back_inserter.hpp>
#include <boost/iostreams/copy.hpp>

static auto const flags = boost::archive::no_header | boost::archive::no_tracking;

template <typename T>
size_t archive_size(T const& v)
{
    std::stringstream ss;
    {
        boost::archive::binary_oarchive oa(ss, flags);
        oa << v;
    }

    std::vector<char> compressed;
    {
        boost::iostreams::filtering_ostream fos;
        fos.push(boost::iostreams::bzip2_compressor());
        fos.push(boost::iostreams::back_inserter(compressed));

        boost::iostreams::copy(ss, fos);
    }

    return compressed.size();
}

int main() {
    std::vector<int> i(100);
    std::cout << "bzip2: " << archive_size(i) << "\n";
}

Prints

bzip2: 47

That's a compression ratio of ~11% (or ~19% if you drop the archive flags).

查看更多
登录 后发表回答