What's the best practice when you have an arra

2019-08-03 23:15发布

问题:

I have a custom struct which I'm gonna use to send data over a TCP connection. What would be the best way of declaring an array inside this struct ? Would it be :

typedef struct programData {
    int* dataArray;
    size_t numberofelements;
} pd;
// ...
pd data = {0};
data.dataArray = malloc(5*sizeof(int));
// put content in array ...
data.numerofelements = 5;

Or would it be this way :

typedef struct programData {
    int dataArray[5];
} pd;
// ...
pd data = {0};
data.dataArray[0] = ...;
// ...
data.dataArray[4] = ...;

I did the first way out of habit of using malloc() in C, but don't think the contents of the array would actually be passed on to the client on the other side of the connection since dataArray would actually be a pointer to a memory address inside the server's memory. Or would send(2) actually send the contents of the array with it ?

edit : some incoherences due to copy pasting from my code

回答1:

send is not a service for transmitting compound data structures, including interpreting the meanings of pointers and connected data. It is a service for sending raw bytes. When using send, you must transform your data into raw bytes that can be sent. The receiver must construct their own data structures from those bytes. This means you must create a scheme for representing your data using bytes.

When the raw bytes of a structure are sent to another system, and the receiving system uses those same raw bytes to represent a structure, the resulting meaning of the data may differ for reasons including:

  • The systems represent objects (such as integers) with bytes in different orders.
  • The systems insert different numbers of padding bytes in the structure to maintain alignment required or preferred by hardware.
  • The systems use different encodings for characters or floating-point data.
  • The types on the system are different, as where one may use two bytes for int while the other uses four.
  • Pointers on one system are meaningless on the other system, as they point to data that was never transmitted to the other system and that contain addresses that are not relevant to the address layout on the other system.

With a simple data structure, it is possible to define the protocol for transmitting raw bytes to send the actual bytes that represent the data structure. This is especially true if the sending and receiving systems use the same hardware and software. However, even in such cases, the protocol should be clearly specified: How big is each element, what data encodings are used, what order are the bytes within each element in, and so on.

Assuming you have simple data structures and use a simple protocol of sending the actual bytes that represent the data, then of course declaring an array inside the structure is the simplest. If the array is small or is usually nearly full, so that only a small amount of waste will occur by storing and transmitted unused data, then declaring an array inside the structure may be a fine solution.

If the amount of data needed in the array will vary more than slightly, then it is usually preferred to allocate the array dynamically, as a matter of resource efficiency. As shown in your question, the structure may contain a pointer, which is filled in with the address of the array data.

When a structure contains such a pointer, you cannot send the pointer with send (without making additional efforts to provide for its interpretation). Instead, you will need to use one or more send calls to send the other data in the structure, and then you will need another send call to send the data in the array. And, of course, your protocol for transmitting the data must include a way to communicate the number of array elements being sent.

One more option mixes both dynamic allocation of space for the array and including the array in the structure: The last element of a structure may be a flexible array member. This is an array declared within the structure as Type dataArray[];. It must be the last element of the structure. It has no intrinsic size, but, when allocating space for the structure, you would add additional space for the array. In this case, instead of the structure having a pointer to an array, the array follows the base portion of the structure in memory. Such a structure with its array could be sent in a single send call, provided the cautions above are provided for: The receiving system must be able to interpret the bytes correctly, and the size of the array must be communicated.



回答2:

Best practice is to let the requirements of your project determine which approach to use. Both have distinct advantages depending on what is needed.

Given your two examples:

1)

typedef struct programData {
int dataArray[5];//assuming '*' was a typo
} pd;

2)

typedef struct programData {
    int* dataArray;
    size_t numberofelements;
} pd; 

If you know the size requirement before run-time, then Option 1), the simpler approach, is always preferred. If not, then Option 2) is needed, but has its costs. Dynamic allocation of memory adds complexity to code with respect to error handling and memory management, and making sure everything that uses calloc and family is freed when done using it.

Serialization, and de-serialization is recommended to transmit either form. (and required for option 2 as pointers are used.) The extra rigor to implement pays dividends in terms of increased predictability of exactly what is being sent.