AVX feature detection using SIGILL versus CPU prob

2019-07-04 05:03发布

问题:

I'm trying to determine an efficient method for detecting the availability of AVX and AVX2 on Intel and AMD processors. I was kind of surprised to learn it was closer to SSE and XSAVE when reading the Intel Software Developer Manual, Volume I (MANAGING STATE USING THE XSAVE FEATURE SET, p. 310).

Intel posts some code for detecting AVX availability at Is AVX enabled? The code is shown below and its not too painful. The problem is, Visual Studio is a pain point because we need to move code out of C/C++ files ind into ASM files for X64.

Others seem to be taking the SIGILL approach to detecting AVX availability. Or they are unwittingly using the SIGILL method. See, for example, SIGILL on AVX instruction.

My question is, is it safe to use the SIGILL method to detect AVX availability? Here, "safe" means an AVX instruction will not generate a SIGILL when the CPU and OS supports AVX; and it will generate a SIGILL otherwise.


The code below is for 32-bit machines and its from the Intel blog Is AVX enabled? The thing that worries me is manipulating the control registers. Reading and writing some X86 and ARM control registers sometimes require super user/administrator privileges. Its the reason I prefer a SIGILL (and avoid control registers).

; int isAvxSupported();
isAvxSupported proc

  xor eax, eax
  cpuid
  cmp eax, 1           ; does CPUID support eax = 1?
  jb not_supported

  mov eax, 1
  cpuid
  and ecx, 018000000h  ; check 27 bit (OS uses XSAVE/XRSTOR)
  cmp ecx, 018000000h  ; and 28       (AVX supported by CPU)
  jne not_supported

  xor ecx, ecx         ; XFEATURE_ENABLED_MASK/XCR0 register number = 0
  xgetbv               ; XFEATURE_ENABLED_MASK register is in edx:eax
  and eax, 110b
  cmp eax, 110b        ; check the AVX registers restore at context switch
  jne not_supported

supported:
  mov eax, 1
  ret

not_supported:
  xor eax, eax
  ret

isAvxSupported endp

回答1:

A bit of theory first.

In order to use the AVX instructions set a few conditions must meet:

  1. CR4.OSXSAVE[bit 18] must be 1.
    This flag is set by the OS to signal the processor that it supports the xsave extensions.
    The xsave extensions are the only way to save the AVX state (fxsave doesn't save the ymm registers) and thus the OS must support them.

  2. XCR0.SSE[bit 1] and XCR0.AVX[bit 2] must be 1.
    These flags are set by the OS to signal the processor that it supports saving and restoring the SSE and AVX states (through xsave).

  3. CPUID.1:ECX.AVX[bit 28] = 1
    Of course, the processor must support the AVX extensions in the first place.

All these registers are user-mode readable but for CR4.
Fortunately, the bit CR4.OSXSAVE is reflected in CPUID.1:ECX.OSXSAVE[bit 27] and thus all information is user-mode accessible. No privileged instructions are involved.

In order to use the AVX extensions both hardware (CPUID.1:ECX.AVX and CPUID.1:ECX.XSAVE) and OS (CPUID.1:ECX.OSXSAVE, XCR0.SSE and XCR0.AVX) support must be present.
Since the OS signals its support for xsave only in presence of the hardware support, testing the former is enough.
For the AVX extensions, testing CPUID.1:ECX.AVX is still recommended as the OS may set XCR0.AVX even if AVX is not supported.

This leads to the Intel official, and strongly recommended, algorithm:

which is the exact same one you posted.


Catching exceptions to detect the support for the AVX extensions will also do granted that you can guarantee that the exception caught is #UD.
For example, by executing vzeroall the only possible exceptions are #UD and #NM.
The first one is thrown only when:

If XCR0[2:1] ≠ ‘11b’.
If CR4.OSXSAVE[bit 18]=0.
If CPUID.01H.ECX.AVX[bit 28]=0.
If VEX.vvvv ≠ 1111B.

So unless you have a broken assembler/compiler, it is exactly equivalent of the conditions stated at the beginning.

The latter is thrown as an optimisation for saving the AVX state and as such, it is not exposed to user-mode programs by the OS.

Thereby catching SIGILL on vzeroall or similar would also do.



标签: c assembly avx