Cast array of bytes to POD

Let's say, I have an array of unsigned chars that represents a bunch of POD objects (e.g. either read from a socket or via mmap). Which types they represent and at what position is determined at runtime, but we assume, that each is already properly aligned.

What is the best way to "cast" those bytes into the respective POD type?

A solution should either be compliant to the c++ standard (let's say >= c++11) or at least be guaranteed to work with g++ >= 4.9, clang++ >= 3.5 and MSVC >= 2015U3. EDIT: On linux, windows, running on x86/x64 or 32/64-Bit arm.

Ideally I'd like to do something like this:

uint8_t buffer[100]; //filled e.g. from network

switch(buffer[0]) {
    case 0: process(*reinterpret_cast<Pod1*>(&buffer[4]); break;
    case 1: process(*reinterpret_cast<Pod2*>(&buffer[8+buffer[1]*4]); break;
    //...
}

switch(buffer[0]) {
    case 0: {
         auto* ptr = new(&buffer[4]) Pod1; 
         process(*ptr); 
    }break;
    case 1: {
         auto* ptr = new(&buffer[8+buffer[1]*4]) Pod2; 
         process(*ptr); 
    }break;
    //...
}

Both seem to work, but both are AFAIK undefined behavior in c++¹⁾. And just for completeness: I'm aware of the "usual" solution to just copy the stuff into an appropriate local variable:

 Pod1 tmp;
 std::copy_n(&buffer[4],sizeof(tmp), reinterpret_cast<uint8_t*>(&tmp));             
 process(tmp);

In some situations it might be no overhead in others it is and in some situations it might even be faster but performance aside, I no longer can e.g. modify the data in place and to be honest: it just annoys me to know that I have the right bits at an appropriate location in memory but I just can't use them.

A somewhat crazy solution I came up with is this:

template<class T>
T* inplace_cast(uint8_t* data) {
    //checks omitted for brevity
    T tmp;
    std::memmove((uint8_t*)&tmp, data, sizeof(tmp));
    auto ptr = new(data) T;
    std::memmove(ptr, (uint8_t*)&tmp,  sizeof(tmp));
    return ptr;

}

g++ and clang++ seem to be able to optimize away those copies but I think this puts a lot of burden on the optimizer and might cause other optimizations to fail, doesn't work with const uint8_t* (although I don't want to actually modify it) and just looks horrible (don't think you would get that past code review).

¹⁾ The first one is UB because it breaks strict aliasing, the second one is probably UB (discussed here) because the standard just says that the resulting object is not initialized and has indeterminate value (instead of guaranteeing that the underlying memory is untouched). I believe the first one's equivalent c-code is well defined, so compilers might allow this for compatibility with c-headers, but I'm unsure of this.

标签： c++ strict-aliasing

2条回答

forever°为你锁心

2楼-- · 2019-03-12 15:12

The most correct way is to create a (temporary) variable of the desired POD class, and to use memcpy() to copy data from the buffer into that variable:

switch(buffer[0]) {
    case 0: {
        Pod1 var;
        std::memcpy(&var, &buffer[4], sizeof var);
        process(var);
        break;
    }
    case 1: {
        Pod2 var;
        std::memcpy(&var, &buffer[8 + buffer[1] * 4], sizeof var);
        process(var);
        break;
    }
    //...
}

There main reason for doing this is because of alignment issues: the data in the buffer may not be aligned correctly for the POD type you are using. Making a copy eliminates this problem. It also allows you to keep using the variable even if the network buffer is no longer available.

Only if you are absolutely sure that the data is properly aligned can you use the first solution you gave.

(If you are reading in data from the network, you should always check that the data is valid first, and that you won't read outside of your buffer. For example with &buffer[8 + buffer[1] * 4], you should check that the start of that address plus the size of Pod2 does not exceed the buffer length. Luckily you are using uint8_t, otherwise you'd also have to check that buffer[1] is not negative.)

0人赞添加讨论(0) 举报

The star\"

3楼-- · 2019-03-12 15:22

Using a union allows to escape anti-aliasing rule. In fact that is what unions are for. So casting pointers to a union type from a type that is part of the union is explicitly allowed in C++ standard (Clause 3.10.10.6). Same thing is allowed in C standard (6.5.7).

Therefore depending on the other properties a conforming equivalent of your sample can be as follows.

union to_pod {
    uint8_t buffer[100];
    Pod1 pod1;
    Pod1 pod2;
    //...
};

uint8_t buffer[100]; //filled e.g. from network

switch(buffer[0]) {
    case 0: process(reinterpret_cast<to_pod*>(buffer + 4)->pod1); break;
    case 1: process(reinterpret_cast<to_pod*>(buffer + 8 + buffer[1]*4)->pod2); break;
    //...
}

0人赞添加讨论(0) 举报

Cast array of bytes to POD

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间