Binary Serialization of std::bitset

2019-02-16 13:20发布

std::bitset has a to_string() method for serializing as a char-based string of 1s and 0s. Obviously, this uses a single 8 bit char for each bit in the bitset, making the serialized representation 8 times longer than necessary.
I want to store the bitset in a binary representation to save space. The to_ulong() method is relevant only when there are less than 32 bits in my bitset. I have hundreds.
I'm not sure I want to use memcpy()/std::copy() on the object (address) itself, as that assumes the object is a POD.

The API does not seem to provide a handle to the internal array representation from which I could have taken the address.

I would also like the option to deserialize the bitset from the binary representation.

How can I do this?

5条回答
Anthone
2楼-- · 2019-02-16 13:24

Answering my own question for completeness.

Apparently, there is no simple and portable way of doing this.

For simplicity (though not efficiency), I ended up using to_string, and then creating consecutive 32-bit bitsets from all 32-bit chunks of the string (and the remainder*), and using to_ulong on each of these to collect the bits into a binary buffer.
This approach leaves the bit-twiddling to the STL itself, though it is probably not the most efficient way to do this.

* Note that since std::bitset is templated on the total bit-count, the remainder bitset needs to use some simple template meta-programming arithmetic.

查看更多
仙女界的扛把子
3楼-- · 2019-02-16 13:35

As suggested by guys at gamedev.net, one can try using boost::dynamic_bitset since it allows access to internal representation of bitpacked data.

查看更多
你好瞎i
4楼-- · 2019-02-16 13:35

I can't see an obvious way other than converting to a string and doing your own serialization of the string that groups chunks of 8 characters into a single serialized byte.

EDIT: Better is to just iterate over all the bits with operator[] and manually serialize it.

查看更多
迷人小祖宗
5楼-- · 2019-02-16 13:43

This is a possible approach based on explicit creation of an std::vector<unsigned char> by reading/writing one bit at a time...

template<size_t N>
std::vector<unsigned char> bitset_to_bytes(const std::bitset<N>& bs)
{
    std::vector<unsigned char> result((N + 7) >> 3);
    for (int j=0; j<int(N); j++)
        result[j>>3] |= (bs[j] << (j & 7));
    return result;
}

template<size_t N>
std::bitset<N> bitset_from_bytes(const std::vector<unsigned char>& buf)
{
    assert(buf.size() == ((N + 7) >> 3));
    std::bitset<N> result;
    for (int j=0; j<int(N); j++)
        result[j] = ((buf[j>>3] >> (j & 7)) & 1);
    return result;
}

Note that to call the de-serialization template function bitset_from_bytes the bitset size N must be specified in the function call, for example

std::bitset<N> bs1;
...
std::vector<unsigned char> buffer = bitset_to_bytes(bs1);
...
std::bitset<N> bs2 = bitset_from_bytes<N>(buffer);

If you really care about speed one solution that would gain something would be doing a loop unrolling so that the packing is done for example one byte at a time, but even better is just to write your own bitset implementation that doesn't hide the internal binary representation instead of using std::bitset.

查看更多
三岁会撩人
6楼-- · 2019-02-16 13:50

edit: The following does not work as intended. Appearently, "binary format" actually means "ASCII representation of binary".


You should be able to write them to a std::ostream using operator<<. It says here:

[Bitsets] can also be directly inserted and extracted from streams in binary format.

查看更多
登录 后发表回答