If I write
int main(int argc, char *argv[])
{
int temp[50][3];
return &temp[argc] - &temp[0];
}
and compile it with Visual C++, I get back:
009360D0 55 push ebp
009360D1 8B EC mov ebp,esp
009360D3 8B 45 08 mov eax,dword ptr [argc]
009360D6 8D 0C 40 lea ecx,[eax+eax*2]
009360D9 B8 AB AA AA 2A mov eax,2AAAAAABh
009360DE C1 E1 02 shl ecx,2
009360E1 F7 E9 imul ecx
009360E3 D1 FA sar edx,1
009360E5 8B C2 mov eax,edx
009360E7 C1 E8 1F shr eax,1Fh
009360EA 03 C2 add eax,edx
009360EC 5D pop ebp
009360ED C3 ret
Why am I getting an imul
instruction here instead of just bit shifts, etc.? I find this pretty annoying, because I'm doing pointer arithmetic like this in a tight loop, and I'm suspecting imul
is killing its performance. In any case, it shouldn't be necessary.
Is there a good way to prevent it in general and instead replace it with cheaper operations?
Update:
In my original program, I tried adding a dummy variable to make the size of each element a multiple of 4 instead of 3 so the compiler can use bit-shifting instead of division.
The result? Even though the data structure was bigger, the running time of the program decreased from 9.2 seconds to 7.4 seconds.
So yes, this is indeed very slow.
The multiplication is to divide by 3 (the size of each inner array), using the fact that
0x2AAAAAAB
is 231/3. You can't do that with a small number of shifts and adds; multiplication really is the fastest option.On most modern platforms, integer multiplication is usually as fast as simpler operations, so it may be the fastest option even when it could be replaced by a couple of shifts and adds. When you have performance issues, always measure to find the real bottlenecks; they're often in the places you least suspect.
On platforms where multiplication really is expensive: avoid arrays of awkwardly sized data structures; and avoid subtracting pointers, since that requires division by the object size.