Half-precision floating-point arithmetic on Intel

2019-03-31 06:53发布

问题:

Is it possible to perform half-precision floating-point arithmetic on Intel chips?

I know how to load/store/convert half-precision floating-point numbers [1] but I do not know how to add/multiply them without converting to single-precision floating-point numbers.

[1] https://software.intel.com/en-us/articles/performance-benefits-of-half-precision-floats

回答1:

Is it possible to perform half-precision floating-point arithmetic on Intel chips?

Yes, apparently the on-chip GPU in Skylake and later has hardware support for FP16 and FP64, as well as FP32. With new enough drivers you can use it via OpenCL.

On earlier chips you get about the same throughput for FP16 vs. FP32 (probably just converting on the fly for nearly free), but on SKL / KBL chips you get about double the throughput of FP32 for GPGPU Mandelbrot (note the log-scale on the Mpix/s axis of the chart in that link).

The gain in FP64 (double) performance was huge, too.

But on the IA cores (Intel-Architecture) no; even with AVX512 there's no hardware support for anything but converting them.

You could of course implement software floating point, possibly even in SIMD registers, so technically the answer is still "yes" to the question you asked, but it won't be faster than using the F16C VCVTPH2PS / VCVTPS2PH instructions + packed-single vmulps / vfmadd132ps HW support.

So technically yes but not in a useful way, except for GPGPU. Use HW-supported SIMD conversion to/from float / __m256 in x86 code.