I understand the padding that takes place between the members of a struct to ensure correct alignment of individual types. However, why does the data structure have to be a multiple of alignment of largest member? I don't understand the padding is needed at the end.
Reference:
http://en.wikipedia.org/wiki/Data_structure_alignment
Good question. Consider this hypothetical type:
struct A {
int n;
bool flag;
};
So, an object of type A
should take five bytes (four for the int plus one for the bool), but in fact it takes eight. Why?
The answer is seen if you use the type like this:
const size_t N = 100;
A a[N];
If each A
were only five bytes, then a[0]
would align but a[1]
, a[2]
and most of the other elements would not.
But why does alignment even matter? There are several reasons, all hardware-related. One reason is that recently/frequently used memory is cached in cache lines on the CPU silicon for rapid access. An aligned object smaller than a cache line always fits in a single line (but see the interesting comments appended below), but an unaligned object may straddle two lines, wasting cache.
There are actually even more fundamental hardware reasons, having to do with the way byte-addressable data is transferred down a 32- or 64-bit data bus, quite apart from cache lines. Not only will misalignment clog the bus with extra fetches (due as before to straddling), but it will also force registers to shift bytes as they come in. Even worse, misalignment tends to confuse optimization logic (at least, Intel's optimization manual says that it does, though I have no personal knowledge of this last point). So, misalignment is very bad from a performance standpoint.
It usually is worth it to waste the padding bytes for these reasons.
Update: The comments below are all useful. I recommend them.
Depending on the hardware, alignment might be necessary or just help speeding up execution.
There is a certain number of processors (ARM I believe) in which an unaligned access leads to a hardware exception. Plain and simple.
Even though typical x86 processors are more lenient, there is still a penalty in accessing unaligned fundamental types, as the processor has to do more work to bring the bits into the register before being able to operate on it. Compilers usually offer specific attributes/pragmas when packing is desirable nonetheless.
Because of virtual addressing.
"...aligning a page on a page-sized boundary lets the
hardware map a virtual address to a physical address by substituting
the higher bits in the address, rather than doing complex arithmetic."
By the way, I found the Wikipedia page on this quite well written.
If the register size of the CPU is 32 bits, then it can grab memory that is on 32 bit boundaries with a single assembly instruction. It is slower to grab 32 bits, and then get the byte that starts at bit 8.
BTW: There doesn't have to be padding. You can ask that structures be packed.