Why does mode_t use 4 byte?

I've just read about mode_t that it basically stores the following information:

7 boolean values for the file type (S_IFREG, S_IFDIR, S_IFCHR, S_ISBLK, S_ISFIFO, S_ISLINK, S_ISSOCK)
3*3 = 9 boolean values for the access permissons (read, write and execute for owner, group and others)

So it needs 16 bit = 2 bytes. I guess you could even have one bit less for the file type, as it has to be either a regular file, a directory, a character or block device, a socket, a symbolic link, or a pipe. Or do other file types exist?

So I've just checked the size of mode_t with

printf("Size: %d byte\n", sizeof(mode_t));

It uses 4 byte. Why does it use 4 byte? Is there any additional information I didn't notice?

edit: I've just found that mode_t is defined in ptypes.inc:

type mode_t = cuint32;

cuint32 is a 32 bits sized, unsigned integer and defined in ctypes.inc:

type cuint32 = LongWord;

Perhaps this helps for the answer.

Let's look at what a "dumb" compiler would do when given the following code:

#include <stdio.h>
#include <stdint.h>

int main(int argc, char **argv) {
  uint16_t test1 = 0x1122;
  uint32_t test2 = 0x11223344;
  if (test1 & 0x0100)
    printf("yay1.\n");
  if (test2 & 0x00010000)
    printf("yay2.\n");
}

This seems like a likely use case for values of type mode_t, checking if a flag is set. Now we compile it with gcc -O0 and check the generated assembly:

0000000000000000 <main>:
            ...
   f:   66 c7 45 fe 22 11       movw   $0x1122,-0x2(%rbp)
  15:   c7 45 f8 44 33 22 11    movl   $0x11223344,-0x8(%rbp)
  1c:   0f b7 45 fe             movzwl -0x2(%rbp),%eax  ; load test1 into %eax
  20:   25 00 01 00 00          and    $0x100,%eax
  25:   85 c0                   test   %eax,%eax
            ...
  33:   8b 45 f8                mov    -0x8(%rbp),%eax  ; load test2 into %eax
  36:   25 00 00 01 00          and    $0x10000,%eax
  3b:   85 c0                   test   %eax,%eax
            ...

See how the special movzwl instruction is needed to load the 16-bit value? This is because it needs to be sign-extended to two additional bytes to fit in the register. Obviously this instruction is more complex than a simple mov. This might have a tiny impact on performance, and it might increase the executable size by some bytes, which by itself wouldn't be too bad.

However, if we consider that there would be no advantage in using a 16-bit value, because it would usually take up 32 bits of storage anyway due to alignment, it's clear why the designers choose to use the native word size of the CPU here.