Casting an unsigned int + a string to an unsigned

2019-09-14 03:34发布

问题:

I'm working with the NetLink socket library ( https://sourceforge.net/apps/wordpress/netlinksockets/ ), and I want to send some binary data over the network in a format that I specify.

The format I have planned is pretty simple and is as follows:

  • Bytes 0 and 1: an opcode of the type uint16_t (i.e., an unsigned integer always 2 bytes long)

  • Bytes 2 onward: any other data necessary, such as a string, an integer, a combination of each, etc.. the other party will interpret this data according to the opcode. For example, if the opcode is 0 which represents "log in", this data will consist of one byte integer telling you how long the username is, followed by a string containing the username, followed by a string containing the password. For opcode 1, "send a chat message", the entire data here could be just a string for the chat message.

Here's what the library gives me to work with for sending data, though:

void send(const string& data);
void send(const char* data);
void rawSend(const vector<unsigned char>* data);

I'm assuming I want to use rawSend() for this.. but rawSend() takes unsigned chars, not a void* pointer to memory? Isn't there going to be some loss of data here if I try to cast certain types of data to an array of unsigned chars? Please correct me if I'm wrong.. but if I'm right, does this mean I should be looking at another library that has support for real binary data transfer?

Assuming this library does serve my purposes, how exactly would I cast and concatenate my various data types into one std::vector? What I've tried is something like this:

#define OPCODE_LOGINREQUEST 0

std::vector<unsigned char>* loginRequestData = new std::vector<unsigned char>();
uint16_t opcode = OPCODE_LOGINREQUEST;
loginRequestData->push_back(opcode);
// and at this point (not shown), I would push_back() the individual characters of the strings of the username and password.. after one byte worth of integer telling you how many characters long the username is (so you know when the username stops and the password begins)
socket->rawSend(loginRequestData);

Ran into some exceptions, though, on the other end when I tried to interpret the data. Am I approaching the casting all wrong? Am I going to lose data by casting to unsigned chars?

Thanks in advance.

回答1:

I like how they make you create a vector (which must use the heap and thus execute in unpredictable time) instead of just falling back to the C standard (const void* buffer, size_t len) tuple, which is compatible with everything and can't be beat for performance. Oh, well.

You could try this:

void send_message(uint16_t opcode, const void* rawData, size_t rawDataSize)
{
    vector<unsigned char> buffer;
    buffer.reserve(sizeof(uint16_t) + rawDataSize);
#if BIG_ENDIAN_OPCODE
    buffer.push_back(opcode >> 8);
    buffer.push_back(opcode & 0xFF);
#elseif LITTLE_ENDIAN_OPCODE
    buffer.push_back(opcode & 0xFF);
    buffer.push_back(opcode >> 8);
#else
    // Native order opcode
    buffer.insert(buffer.end(), reinterpret_cast<const unsigned char*>(&opcode), 
        reinterpret_cast<const unsigned char*>(&opcode) + sizeof(uint16_t));
#endif
    const unsigned char* base(reinterpret_cast<const unsigned char*>(rawData));
    buffer.insert(buffer.end(), base, base + rawDataSize);
    socket->rawSend(&buffer); // Why isn't this API using a reference?!
}

This uses insert which should optimize better than a hand-written loop with push_back(). It also won't leak the buffer if rawSend tosses an exception.

NOTE: Byte order must match for the platforms on both ends of this connection. If it does not, you'll need to either pick one byte order and stick with it (Internet standards usually do this, and you use the htonl and htons functions) or you need to detect byte order ("native" or "backwards" from the receiver's POV) and fix it if "backwards".



回答2:

I would use something like this:

#define OPCODE_LOGINREQUEST 0 
#define OPCODE_MESSAGE 1

void addRaw(std::vector<unsigned char> &v, const void *data, const size_t len)
{
    const unsigned char *ptr = static_cast<const unsigned char*>(data);
    v.insert(v.end(), ptr, ptr + len);
}

void addUint8(std::vector<unsigned char> &v, uint8_t val)
{
    v.push_back(val);
}

void addUint16(std::vector<unsigned char> &v, uint16_t val)
{
    val = htons(val);
    addRaw(v, &val, sizeof(uint16_t));
}

void addStringLen(std::vector<unsigned char> &v, const std::string &val)
{
    uint8_t len = std::min(val.length(), 255);
    addUint8(v, len);
    addRaw(v, val.c_str(), len);
}

void addStringRaw(std::vector<unsigned char> &v, const std::string &val)
{
    addRaw(v, val.c_str(), val.length());
}

void sendLogin(const std::string &user, const std::string &pass)
{
    std::vector<unsigned char> data(
        sizeof(uint16_t) +
        sizeof(uint8_t) + std::min(user.length(), 255) +
        sizeof(uint8_t) + std::min(pass.length(), 255)
    );
    addUint16(data, OPCODE_LOGINREQUEST);
    addStringLen(data, user);
    addStringLen(data, pass);
    socket->rawSend(&data);
}

void sendMsg(const std::string &msg)
{
    std::vector<unsigned char> data(
      sizeof(uint16_t) +
      msg.length()
    );
    addUint16(data, OPCODE_MESSAGE);
    addStringRaw(data, msg);
    socket->rawSend(&data);
}


回答3:

std::vector<unsigned char>* loginRequestData = new std::vector<unsigned char>();
uint16_t opcode = OPCODE_LOGINREQUEST;
loginRequestData->push_back(opcode);

If unsigned char is 8 bits long -which in most systems is-, you will be loosing the higher 8 bits from opcode every time you push. You should be getting a warning for this.

The decision for rawSend to take a vector is quite odd, a general library would work at a different level of abstraction. I can only guess that it is this way because rawSend makes a copy of the passed data, and guarantees its lifetime until the operation has completed. If not, then is just a poor design choice; add to that the fact that its taking the argument by pointer... You should see this data as a container of raw memory, there are some quirks to get right but here is how you would be expected to work with pod types in this scenario:

data->insert( data->end(), reinterpret_cast< char const* >( &opcode ), reinterpret_cast< char const* >( &opcode ) + sizeof( opcode ) );


回答4:

This will work:

#define OPCODE_LOGINREQUEST 0

std::vector<unsigned char>* loginRequestData = new std::vector<unsigned char>();
uint16_t opcode = OPCODE_LOGINREQUEST;
unsigned char *opcode_data = (unsigned char *)&opcode;
for(int i = 0; i < sizeof(opcode); i++)
    loginRequestData->push_back(opcode_data[i]);
socket->rawSend(loginRequestData);

This will also work for any POD type.



回答5:

Yeah, go with rawSend since send probably expects a NULL terminator.

You don't lose anything by casting to char instead of void*. Memory is memory. Types are never stored in memory in C++ except for RTTI info. You can recover your data by casting to the type indicated by your opcode.

If you can decide the format of all your sends at compile time, I recommend using structs to represent them. I've done this before professionally, and this is simply the best way to clearly store the formats for a wide variety of messages. And it's super easy to unpack on the other side; just cast the raw buffer into the struct based on the opcode!

struct MessageType1 {
    uint16_t opcode;
    int myData1;
    int myData2;
};

MessageType1 msg;

std::vector<char> vec;
char* end = (char*)&msg + sizeof(msg);
vec.insert( vec.end(), &msg, end );

send(vec);

The struct approach is the best, neatest way to send and receive, but the layout is fixed at compile time. If the format of the messages is not decided until runtime, use a char array:

char buffer[2048];

*((uint16_t*)buffer) = opcode;
// now memcpy into it
// or placement-new to construct objects in the buffer memory

int usedBufferSpace = 24; //or whatever

std::vector<char> vec;
const char* end = buffer + usedBufferSpace;
vec.insert( vec.end(), buffer, end );

send(&buffer);