I always use double to do calculations but double offers far better accuracy than I need (or makes sense, considering that most of the calculations I do are approximations to begin with).
But since the processor is already 64bit, I do not expect that using a type with less bits will be of any benefit.
Am I right/wrong, how would I optimize for speed (I understand that smaller types would be more memory efficient)
here is the test
#include <cmath>
#include <ctime>
#include <cstdio>
template<typename T>
void creatematrix(int m,int n, T **&M){
M = new T*[m];
T *M_data = new T[m*n];
for(int i=0; i< m; ++i)
{
M[i] = M_data + i * n;
}
}
void main(){
clock_t start,end;
double diffs;
const int N = 4096;
const int rep =8;
float **m1,**m2;
creatematrix(N,N,m1);creatematrix(N,N,m2);
start=clock();
for(int k = 0;k<rep;k++){
for(int i = 0;i<N;i++){
for(int j =0;j<N;j++)
m1[i][j]=sqrt(m1[i][j]*m2[i][j]+0.1586);
}
}
end = clock();
diffs = (end - start)/(double)CLOCKS_PER_SEC;
printf("time = %lf\n",diffs);
delete[] m1[0];
delete[] m1;
delete[] m2[0];
delete[] m2;
getchar();
}
there was no time difference between double and float, however when square root is not used, float is twice as fast.
There are a couple of ways they can be faster:
DIVSS
(float division) takes 7 clock cycles, whereas aDIVSD
(double division) takes 8-14 (source: Agner Fog's tables).log
,sin
) can use lower-degree polynomials: e.g. the openlibm implementation oflog
uses a degree 7 polynomial, whereaslogf
only needs degree 4.float
todouble
, whereas for adouble
you need either software double-double, or slowerlong double
.Note that these points also hold for 32-bit architectures as well: unlike integers, there's nothing particularly special about having the size of the format match your architecture, i.e. on most machines doubles are just as "native" as floats.