Why are SIMD instructions not used in kernel?

2019-06-21 07:48发布

问题:

I couldn't find much use of SIMD instructions (like SSE/AVX) in kernel (except one place where they were used to speedup parity computation of RAID6).

Q1) Any specific reason for this or just the lack of use-case?

Q2) What needs to be done today if I want to use SIMD instruction, in say a device driver?

Q3) How hard will it be to incorporate framework like ISPC into kernel (just for experimentation)?

回答1:

Saving/restoring FPU (including SIMD vector registers) state is more expensive than just integer GP register state. It's simply not worth the cost in most cases.

In Linux kernel code, all you have to do is call kernel_fpu_begin() / kernel_fpu_end() around your code. This is what the RAID drivers do. See http://yarchive.net/comp/linux/kernel_fp.html.


x86 doesn't have any future-proof way to save/restore one or a couple vector registers. (Other than manual save/restore of an xmm register using legacy SSE instructions, potentially causing SSE/AVX transition stalls on Intel CPUs if user-space had the upper halves of any ymm/zmm registers dirty).

The reason legacy SSE works is that some Windows drivers were already doing this when Intel wanted to introduce AVX, so they invented that transition-penalty stuff instead of having legacy SSE instructions zero the upper 128b of ymm registers. (See this for more detail on that design decision.) So basically we can blame Windows binary-only drivers for the SSE/AVX transition-penalty mess.

IDK about non-x86 architectures, and whether the existing SIMD instruction sets have a future-proof way to save/restore a register that will continue to work for longer vectors. ARM32 might, if extensions continue the pattern of using multiple 32-bit FP registers as single wider register. (e.g. q2 is composed of s8 through s11.) So saving/restoring a couple q registers should be future-proof, if a 256b NEON extension simply lets you use 2 q registers as one 256b register. Or if the new wider vectors are separate, and don't extend the existing registers.