SSE: why, technically, is 16-aligned data faster t

2019-09-20 08:36发布

问题:

Is it a bus architecture issue? How is it circumvented in i7?

I'm aware of this, I just don't think it answers the real why.

回答1:

The processor is built to work with data of certain sizes and alignments. When you use data outside of those sizes and alignments, you effectively need to shift it into alignment, crop it, compute on it using the normal instructions, then shift it back into place.