This question already has an answer here:
- Python Numpy Data Types Performance 2 answers
In my benchmark using numpy 1.12.0, calculating dot products with float32
ndarrays
is much faster than the other data types:
In [3]: f16 = np.random.random((500000, 128)).astype('float16')
In [4]: f32 = np.random.random((500000, 128)).astype('float32')
In [5]: uint = np.random.randint(1, 60000, (500000, 128)).astype('uint16')
In [7]: %timeit np.einsum('ij,ij->i', f16, f16)
1 loop, best of 3: 320 ms per loop
In [8]: %timeit np.einsum('ij,ij->i', f32, f32)
The slowest run took 4.88 times longer than the fastest. This could mean that an intermediate result is being cached.
10 loops, best of 3: 19 ms per loop
In [9]: %timeit np.einsum('ij,ij->i', uint, uint)
10 loops, best of 3: 43.5 ms per loop
I've tried profiling einsum
, but it just delegates all the computing to a C function, so I don't know what's the main reason for this performance difference.
My tests with your
f16
andf32
arrays shows thatf16
is 5-10x slower for all calculations. It's only when doing byte level operations like arraycopy
does more compact nature of float16 show any speed advantage.https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html
Is the section in the
gcc
docs about half floats, fp16. With the right processor and right compiler switches, it may possible to install numpy in way that speeds up these calculations. We'd also have to check ifnumpy
.h
files have any provision for special handling of half floats.Earlier questions, may be good enough to be duplicate references
Python Numpy Data Types Performance
Python numpy float16 datatype operations, and float8?