I googled it and some people says "To keep the same size with struct sockaddr". But Kernel will not use sockaddr directly(right?). When using it. kernel will cast it back to what it is. So why is zero padding needed?
struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};
struct sockaddr_in {
short sin_family; // e.g. AF_INET, AF_INET6
unsigned short sin_port; // e.g. htons(3490)
struct in_addr sin_addr; // see struct in_addr, below
char sin_zero[8]; // zero this if you want to
};
struct in_addr {
unsigned long s_addr; // load with inet_pton()
};
The two more relevant pieces of information I could find are
Talking about a snippet of code that does not clear the bytes
This is a bug. I see it occur occasionaly. This bug can cause undefined behaviour in applications.
Followed with some explications
Most of the net code does not use sockaddr_in, it uses sockaddr. When you use a function like sendto, you must explicitly cast sockaddr_in, or whatever address you are using, to sockaddr. sockaddr_in is the same size as sockaddr, but internally the sizes are the same because of a slight hack.
That hack is sin_zero. Really the length of useful data in sockaddr_in is shorter than sockaddr. But the difference is padded in sockaddr_in using a small buffer; that buffer is sin_zero.
and finally, an information that can be found at various places
On some architectures, it wont cause any problems not clearing sin_zero. But on other architectures it might. Its required by specification to clear sin_zero, so you must do this if you intend your code to be bug free for now and in the future.
answering the question
why we need this 8 byte padding?
and the answer
Unix network programming chapter 3.2 says that, "The POSIX specification
requires only three members in the structure: sin_family, sin_addr, and
sin_port. It is acceptable for a POSIX-compliant implementation to define
additional structure members, and this is normal for an Internet socket address
structure. Almost all implementations add the sin_zero member so that all socket
address structures are at least 16 bytes in size. "
It's kinda like structure padding, maybe reserved for extra fields in the
future. You will never use it, just as commented.
which is consistent with the first link. Clearing the bytes tells the receiver "those bytes are not used on our side".
As struct sockaddr_in needs to be cast to struct sockaddr it has to be kept the same size, sin_zero is an unused member whose sole purpose is to pad the structure out to 16 bytes (which is the size of sock_addr). This padding size may vary depending on the address family. For example;
struct sockaddr_in {
short int sin_family; // Address family, AF_INET
unsigned short int sin_port; // Port number
struct in_addr sin_addr; // Internet address
unsigned char sin_zero[8]; // For padding, to make it same size as struct sockaddr
};
Now take the Xerox NS family which has different struct members:
struct sockaddr_ns {
u_short sns_family; // Address family, AF_NS
struct ns_addr sns_addr; // the 12-byte XNS address
char sns_zero[2]; // unused except for padding
};
Structure padding occurs because the members of the structure must appear at the correect byte boundary, to achieve this the compiler puts in padding bytes (or bits if bit fields are in use) so that the structure members appear in the correct location. Additionally the size of the structure must be such that in an array of the structures all the structures are correctly aligned in memory.
So, may be it needed for ignoring memory leaks.
struct sockaddr
is the abstract, incomplete version of this structure with only the family. struct sockaddr_in
is the IPv4 version of this structure. It only utilizes the first 8 bytes. struct sockaddr_in6
is the IPv6 version of this structure, and is larger. The padding allows smaller structures to accommodate the largest variation of this structure, so the buffer isn't undersize.
When you're passing an address to a function or system call, the extra bytes aren't really necessary. But retrieving an address, you provide a structure address for the results. That structure needs to be the largest of all possible variations. Were it not—imagine you provided an IPv4 version, but got back an IPv6 address—then the results would exceed the structure and corrupt whatever's next door in memory.
To avoid this memory corruption, most of the related functions take the structure size as a parameter. But now, when you pass that IPv4 version and its too-small size, you end up with an incompletely populated IPv6 version of the structure. Looking at the family, you can see it's IPv6. But if you cast the structure to IPv6 and try to use it, the contents are wrong because the structure was too small to contain full, valid data.
Padding the smaller structure avoids these snags, and avoids any related potential security problems.