I'm learning socket programming and am confused by what I feel is inconsistent use of htons()
and family of functions in my learning material. I'm currently reading this site which has the following code segment:
001 1: struct sockaddr_in adr_inet;
002 2: int adr_len;
003 3:
004 4: memset(&adr_inet,0,sizeof adr_inet);
005 5:
006 6: adr_inet.sin_family = AF_INET;
007 7: adr_inet.sin_port = ntohs(0);
008 8: adr_inet.sin_addr.s_addr = ntohl(INADDR_ANY);
009 9: adr_len = sizeof adr_inet;
A subsequent example further down at the same noted site has the following code segment:
030 30: struct sockaddr_in adr_inet;/* AF_INET */
...
042 42: /* Create an AF_INET address */
043 43: memset(&adr_inet,0,sizeof adr_inet);
044 44:
045 45: adr_inet.sin_family = AF_INET;
046 46: adr_inet.sin_port = htons(9000);
047 47: memcpy(&adr_inet.sin_addr.s_addr,IPno,4);
048 48: len_inet = sizeof adr_inet;
049 49:
050 50: /* Now bind the address to the socket */
051 51: z = bind(sck_inet,
052 52: (struct sockaddr *)&adr_inet,
053 53: len_inet);
Question:
Why is ntohs()
used on adr_inet.sin_port
in the first instance, but htons()
in the second?
Question:
Why is neither ntohs()
nor htons()
used on adr_inet.sin_family
?
The noted site doesn't explain why ntohs()
or htons()
are used in their respective examples; it only says to "note the use of" said functions.
I understand endianness and that network byte order is big-endian order. My questions is more about when do you want a struct sockaddr_in
's members in network vs. host byte order? In the second code example, .sin_port
is set to network byte order before being passed to bind()
. I can see the case for passing data to this function in either network or host byte order: bind()
is a "network-related" function, so maybe it needs its data in network byte order; on the other hand bind()
is executed on the host, so why shouldn't accept data in host byte order?
The first is a mistake, but in practice works anyway.
Nowadays practically all machines use 8-bit bytes and either consistent big-endian or consistent little-endian formats. On the former both
hton[sl]
andntoh[sl]
are no-ops; on the latter both reverse the byte order, and thus actually do the same thing even though their intended semantics are different. Thus using the wrong one still works on all systems you're likely to run a program on.Back when the socket API was designed this wasn't always the case; for example the then-popular PDP-11 somewhat infamously used 'middle-endian' (!) aka 'NUXI' order for 32-bit.
Again in ancient times the Internet Protocol stack was only one of several (up to a dozen or so) competing network technologies. The
family
field distinguishes different types ofsockaddr_*
structures for these different protocols, which did not all follow the Internet 'rule' for big-endian, at least not consistently. As there was no universal network representation forfamily
they just left it in host order -- which is usually more convenient for host software.Nowadays in practice nobody uses any families but INET, INET6, and sometimes UNIX -- and the latter can be replaced by using named pipes in the filesystem which is usually at least as good.
adr_inet.sin_family
is initialized to the value ofAF_INET
. This is defined inbits/socket.h
(which is called bynetinet/in.h
in your example) as:and then,
So
AF_INET
is just a way for the program to identify the associated socket as a TCP/IP connection. It won't actually hold the value of an IPv4 address itself, so there's no need to perform an endian conversion on it.Also, note that in newer iterations of C,
netinet/in.h
has a comment that states the following:Whereas the website you're referencing cites the older use of
unsigned long
andunsigned short
datatypes for the conversion functions. So there's a chance you may encounter issues running code from that site if you're using a newer version of C.