Possible Duplicate:
Purpose of memory alignment
I read some articles on net about memory alignment and could understand that from properly aligned memory (take 2-byte alignment) we can fetch data fastly in one go.
But if we have memory like a single hardware piece, then given an address, why cannot we read 2-byte directly from that position. like:
I thought over it. I think that if the memory is in odd-even banks kind of then the theory would apply.
What am i missing ?
Your pictures describe how we (humans) visualize computer memory.
In reality, think about memory as huge matrix of bits. Each matrix column has a "reader" attached that can read/write any bit from this column. Each matrix row has a "selector", which can select the specific bit that the reader will read/write.
Therefore, this reader can read the whole selected matrix row at once. Length of this row (number of matrix columns) define how much data can be read at once. For instance, if you have 64 columns, your memory controller can read 8 bytes at once (it usually can do more than that though).
As long as you keep your data aligned, you will need less of these memory accesses. Even if you need to read just two bits, but they are located on different rows, you will need two accesses to memory instead of one.
Also, there's a whole aspect of writing, which is a different problem.
Just as you can read the whole row, you also can write the whole row. If your data isn't aligned, when you write something that is not a full row, you will need to do read-modify-write (read the old content of the row, modify the relevant part and write the new content).
In first case(single piece of hardware), if you need to read 2 bytes then the processor will have to issue two read cycles, this is because memory is byte-addressable i.e each byte is provided a unique address.
Organizing memory as banks help the CPU to fetch more data into registers in a single read cycles. This technique helps in reducing read cycles-which is a very slow process as compared to CPU's processing capacity. Thus, for a single read cycle you can read more amount of data.
Data from memory is typically delivered to the processor on a set of wires that matches the bus width. E.g., if the bus is 32 bits wide, there are 32 data wires going from the bus into the processor (along with other wires for control signals).
Inside the processor, various wires and switches deliver this data to wherever it is needed. If you read 32 aligned bits into a register, the wires can deliver the data very directly to a register (or other holding location).
If you read 8 or 16 aligned bits into a register, the wires can deliver the data the same way, and the other bits in the register are set to zero.
If you read 8 or 16 unaligned bits into a register, the wires cannot deliver the data directly. Instead, the bits must be shifted: They must go through a different set of wires, so that they can be “moved over” to line up with the wires going into the register.
In some processors, the designers have put additional wires and switches to do this moving. This can be very expensive in terms of the amount of silicon it takes. You need a lot of extra wires and switches in order to be able to move any possible unaligned bytes to desired locations. Because this is so expensive, in some processors, there is not a full shifter that can do all shifts immediately. Instead, the shifter might be able to move bits only by a byte or so per CPU cycles, and it takes several cycles to shift by several bytes. In some processors, there are no wires for this at all, so all loads and stores must be aligned.