I have a short to float cast in C++ that is bottlenecking my code.
The code translates from a hardware device buffer which is natively shorts, this represents the input from a fancy photon counter.
float factor= 1.0f/value;
for (int i = 0; i < W*H; i++)//25% of time is spent doing this
{
int value = source[i];//ushort -> int
destination[i] = value*factor;//int*float->float
}
A few details
Value should go from 0 to 2^16-1, it represents the pixel values of a highly sensitive camera
I'm on a multicore x86 machine with an i7 processor (i7 960 which is SSE 4.2 and 4.1).
Source is aligned to an 8 bit boundary (a requirement of the hardware device)
W*H is always divisible by 8, most of the time W and H are divisible by 8
This makes me sad, is there anything I can do about it?
I am using Visual Studios 2012...
You could try to approximate the expression
by an fraction
numerator/denomitator
where bothnumerator
anddenominator
areint
s. This can be done to the precision you need in your application likeThen you can do your computation in integer arithmetics like
You have to take care of overflows, perhaps you need to switch to
long
(or evenlong long
on 64bit systems) for the calculation.