I'm trying to determine an efficient method for detecting the availability of AVX and AVX2 on Intel and AMD processors. I was kind of surprised to learn it was closer to SSE and XSAVE when reading the Intel Software Developer Manual, Volume I (MANAGING STATE USING THE XSAVE FEATURE SET, p. 310).
Intel posts some code for detecting AVX availability at Is AVX enabled? The code is shown below and its not too painful. The problem is, Visual Studio is a pain point because we need to move code out of C/C++ files ind into ASM files for X64.
Others seem to be taking the SIGILL
approach to detecting AVX availability. Or they are unwittingly using the SIGILL
method. See, for example, SIGILL on AVX instruction.
My question is, is it safe to use the SIGILL
method to detect AVX availability? Here, "safe" means an AVX instruction will not generate a SIGILL
when the CPU and OS supports AVX; and it will generate a SIGILL
otherwise.
The code below is for 32-bit machines and its from the Intel blog Is AVX enabled? The thing that worries me is manipulating the control registers. Reading and writing some X86 and ARM control registers sometimes require super user/administrator privileges. Its the reason I prefer a SIGILL
(and avoid control registers).
; int isAvxSupported();
isAvxSupported proc
xor eax, eax
cpuid
cmp eax, 1 ; does CPUID support eax = 1?
jb not_supported
mov eax, 1
cpuid
and ecx, 018000000h ; check 27 bit (OS uses XSAVE/XRSTOR)
cmp ecx, 018000000h ; and 28 (AVX supported by CPU)
jne not_supported
xor ecx, ecx ; XFEATURE_ENABLED_MASK/XCR0 register number = 0
xgetbv ; XFEATURE_ENABLED_MASK register is in edx:eax
and eax, 110b
cmp eax, 110b ; check the AVX registers restore at context switch
jne not_supported
supported:
mov eax, 1
ret
not_supported:
xor eax, eax
ret
isAvxSupported endp
A bit of theory first.
In order to use the AVX instructions set a few conditions must meet:
CR4.OSXSAVE[bit 18]
must be 1.
This flag is set by the OS to signal the processor that it supports the xsave
extensions.
The xsave
extensions are the only way to save the AVX state (fxsave
doesn't save the ymm
registers) and thus the OS must support them.
XCR0.SSE[bit 1]
and XCR0.AVX[bit 2]
must be 1.
These flags are set by the OS to signal the processor that it supports saving and restoring the SSE and AVX states (through xsave
).
CPUID.1:ECX.AVX[bit 28] = 1
Of course, the processor must support the AVX extensions in the first place.
All these registers are user-mode readable but for CR4
.
Fortunately, the bit CR4.OSXSAVE
is reflected in CPUID.1:ECX.OSXSAVE[bit 27]
and thus all information is user-mode accessible.
No privileged instructions are involved.
In order to use the AVX extensions both hardware (CPUID.1:ECX.AVX
and CPUID.1:ECX.XSAVE
) and OS (CPUID.1:ECX.OSXSAVE
, XCR0.SSE
and XCR0.AVX
) support must be present.
Since the OS signals its support for xsave
only in presence of the hardware support, testing the former is enough.
For the AVX extensions, testing CPUID.1:ECX.AVX
is still recommended as the OS may set XCR0.AVX
even if AVX is not supported.
This leads to the Intel official, and strongly recommended, algorithm:
which is the exact same one you posted.
Catching exceptions to detect the support for the AVX extensions will also do granted that you can guarantee that the exception caught is #UD.
For example, by executing vzeroall
the only possible exceptions are #UD and #NM.
The first one is thrown only when:
If XCR0[2:1] ≠ ‘11b’.
If CR4.OSXSAVE[bit 18]=0.
If CPUID.01H.ECX.AVX[bit 28]=0.
If VEX.vvvv ≠ 1111B.
So unless you have a broken assembler/compiler, it is exactly equivalent of the conditions stated at the beginning.
The latter is thrown as an optimisation for saving the AVX state and as such, it is not exposed to user-mode programs by the OS.
Thereby catching SIGILL
on vzeroall
or similar would also do.