I am compiling a piece of UEFI C code with Visual Studio 2015 C/C++ compiler.
The compiler is targeting IA32, not X64.
When turning on the optimization with "/O1", the build is OK.
When turning off the optimization with "/Od", the build gives below error:
error LNK2001: unresolved external symbol __aullshr
According to here, there's an explanation why this kind of functions can be called implicitly by the compiler:
It turns out that this function is one of several compiler support
functions that are invoked explicitly by the Microsoft C/C++ compiler.
In this case, this function is called whenever the 32-bit compiler
needs to multiply two 64-bit integers together. The EDK does not link
with Microsoft's libraries and does not provide this function.
Are there other functions like this one? Sure, several more for 64-bit
division, remainder and shifting.
But according to here:
...Compilers that implement intrinsic functions generally enable them
only when a program requests optimization...
So how could such functions still be called when I explicitly turned off the optimization with /Od
??
ADD 1 - 2:32 PM 2/16/2019
It seems I am wrong about the __aullshr
function.
It is not a compiler intrinsic function. According to here, it turns out to be a runtime library function, whose implementation can be found in: C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\intel\ullshr.asm
or C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\crt\src\i386\ullshr.asm
Such VC runtime functions are brought in by the compiler for 32-bit applications to do 64-bit operations.
But I still don't know why /O1
can build pass while /Od
failed?
It seems optimization switch can affect the usage of VC runtime library.
ADD 2 - 4:59 PM 2/17/2019
I found the code that cause the build failure.
It turns out to be some C struct bit field operation. There's a 64-bit C struct which has a lot of bit fields backed by a single UINT64 variable. When I comment out the single line of code that access these bit fields, the build is pass. It seems _aullshr()
function is used to access these bit fields when /Od
is specified.
Since this is part of the firmware code, I am wondering if it is a good practice to turn-off the optimization with /Od
?
ADD 3 - 9:33 AM 2/18/2019
I created below minimal reproducible example for VS2015.
First, there's a static lib project:
(test.c)
typedef unsigned __int64 UINT64;
typedef union {
struct {
UINT64 field1 : 16;
UINT64 field2 : 16;
UINT64 field3 : 6;
UINT64 field4 : 15;
UINT64 field5 : 2;
UINT64 field6 : 1;
UINT64 field7 : 1;
UINT64 field8 : 1; //<=========
UINT64 field9 : 1;
UINT64 field10 : 1;
UINT64 field11 : 1;
UINT64 field12 : 1; //<=========
UINT64 field13 : 1;
UINT64 field14 : 1;
} Bits;
UINT64 Data;
} ISSUE_STRUCT;
int
Method1
(
UINT64 Data
)
{
ISSUE_STRUCT IssueStruct;
IssueStruct.Data = Data;
if (IssueStruct.Bits.field8 == 1 && IssueStruct.Bits.field12 == 1) { // <==== HERE
return 1;
}
else
{
return 0;
}
}
Then a Windows DLL project:
(DllMain.c)
#include <Windows.h>
typedef unsigned __int64 UINT64;
int
Method1
(
UINT64 Data
);
int __stdcall DllMethod1
(
HINSTANCE hinstDLL,
DWORD fdwReason,
LPVOID lpReserved
)
{
if (Method1(1234)) //<===== Use the Method1 from the test.obj
{
return 1;
}
return 2;
}
Build process:
First, compile the test.obj:
cl.exe /nologo /arch:IA32 /c /GS- /W4 /Gs32768 /D UNICODE /O1b2 /GL
/EHs-c- /GR- /GF /Gy /Zi /Gm /Gw /Od /Zl test.c
(add: VC++ 2015 compiler gives below warning for test.obj
:
warning C4214: nonstandard extension used: bit field types other than
int
)
Then compile the DllMain.obj:
cl /nologo /arch:IA32 /c /GS- /W4 /Gs32768 /D UNICODE /O1b2 /GL
/EHs-c- /GR- /GF /Gy /Zi /Gm /Gw /Od /Zl DllMain.c
Then link the DllMain.obj to the test.obj
link DllMain.obj ..\aullshr\test.obj /NOLOGO /NODEFAULTLIB
/IGNORE:4001 /OPT:REF /OPT:ICF=10 /MAP /ALIGN:32 /SECTION:.xdata,D
/SECTION:.pdata,D /MACHINE:X86 /LTCG /SAFESEH:NO /DLL
/ENTRY:DllMethod1 /DRIVER
It will give below error:
Generating code Finished generating code test.obj : error LNK2001:
unresolved external symbol __aullshr DllMain.dll : fatal error
LNK1120: 1 unresolved externals
If I remove the bit field manipulation code at HERE in the test.c, the link error will disappear.
If I only remove the /Od from the compile options for the test.c, the link error will disappear.
ADD 4 - 12:40 PM 2/18/2019
Thanks to @PeterCordes in his comment, there's an even simpler way to reproduce this issue. Just invoke below method:
uint64_t shr(uint64_t a, unsigned c) { return a >> c; }
Then compile the source code with below command:
cl /nologo /arch:IA32 /c /GS- /W4 /Gs32768 /D UNICODE /O1b2 /GL
/EHs-c- /GR- /GF /Gy /Zi /Gm /Gw /Od /Zl DllMain.c
link DllMain.obj /NOLOGO /NODEFAULTLIB /IGNORE:4001 /OPT:REF
/OPT:ICF=10 /MAP /ALIGN:32 /SECTION:.xdata,D /SECTION:.pdata,D
/MACHINE:X86 /LTCG /SAFESEH:NO /DLL /ENTRY:DllMethod1 /DRIVER
This issue can be reproduced for:
Microsoft (R) C/C++ Optimizing Compiler Version 18.00.40629 for x86 (VS2013)
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24210 for x86 (VS2015)
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x86 (VS2015)
As mandated in the UEFI coding standard 5.6.3.4 Bit Fields :
Bit fields may only be of type INT32, signed INT32, UINT32, or a
typedef name defined as one of the three INT32 variants.
So my final solution is to modify the UEFI code to use UINT32
instead of UINT64
.
Your build setup for creating UEFI applications omits the static library of helper functions that MSVC's code-gen expects to be available. MSVC's code-gen sometimes inserts calls to helper functions, the same way that gcc does for 64x64 multiply or divide on 32-bit platforms, or various other things. (e.g. popcount on targets without hardware popcnt.)
In this case hand-holding MSVC into less-stupid code-gen (a good thing in itself) happens to remove all the uses of helper functions for your codebase. That's good but doesn't fix your build setup. It can break again if you add code in the future that needs a helper. uint64_t shr(uint64_t a, unsigned c) { return a >> c; }
does compile to include a call to the helper function even at -O2
.
A shift by a constant without optimization uses _aullshr
, instead of inlining as shrd
/ shr
. This exact problem (broken -Od
builds) would recur with uint64_t x
; x >> 4
or something in your source.
(I don't know where MSVC keeps its library of helper functions. We think it's a static library that you could link without introducing a DLL dependency (impossible for UEFI), but we don't know if its maybe bundled with some CRT startup code that you need to avoid linking with for UEFI.)
The un-optimized vs. optimized issue is clear with this example. MSVC with optimization doesn't need the helper function, but its braindead -Od
code does.
For bitfield access, MSVC apparently uses a right shift of the base type of the bitfield member. In your case, you made it a 64-bit type, and 32-bit x86 doesn't have a 64-bit integer shift (except using MMX or SSE2). With -Od
even for constant counts it puts the data in EDX:EAX, the shift count in cl
(just like for x86 shift instructions), and calls __aullshr
.
__a
= ??
ull
= unsigned long long.
shr
= shift right (like the x86 asm instruction of the same name).
- it takes the shift count in
cl
, exactly like x86 shift instructions.
From the Godbolt compiler explorer, x86 MSVC 19.16 -Od
, with UINT64
as the bitfield member type.
;; from int Method1(unsigned __int64) PROC
...
; extract IssueStruct.Bits.field8
mov eax, DWORD PTR _IssueStruct$[ebp]
mov edx, DWORD PTR _IssueStruct$[ebp+4]
mov cl, 57 ; 00000039H
call __aullshr ; emulation of shr edx:eax, cl
and eax, 1
and edx, 0
;; then store that to memory and cmp/jcc both halves. Ultra braindead
Obviously for constant shift and access to only 1 bit, that's easy to optimize, so MSVC doesn't actually call the helper function at -O2
. It's still pretty inefficient, though! It fails to fully optimize away the 64-bitness of the base type, even though none of the bitfields are wider than 32.
; x86 MSVC 19.16 -O2 with unsigned long long as the bitfield type
int Method1(unsigned __int64) PROC ; Method1, COMDAT
mov edx, DWORD PTR _Data$[esp] ; load the high half of the inputs arg
xor eax, eax ; zero the low half?!?
mov ecx, edx ; copy the high half
and ecx, 33554432 ; 02000000H ; isolate bit 57
or eax, ecx ; set flags from low |= high
je SHORT $LN2@Method1
and edx, 536870912 ; 20000000H ; isolate bit 61
xor eax, eax ; re-materialize low=0 ?!?
or eax, edx ; set flags from low |= high
je SHORT $LN2@Method1
mov eax, 1
ret 0
$LN2@Method1:
xor eax, eax
ret 0
int Method1(unsigned __int64) ENDP ; Method1
Obviously this is really stupid materializing a 0
for the low half instead of just ignoring it. MSVC does much better if we change the bitfield member type to unsigned
. (In the Godbolt link, I changed that to bf_t
so I could use a typedef separate from UINT64, keeping that for the other union member.)
With a struct based on unsigned field : 1
bitfield members, MSVC doesn't need the helper at -Od
And it even makes better code at -O2
, so you should definitely do that in your real production code. Only use uint64_t
or unsigned long long
members for fields that need to be wider than 32 bits, if you care about performance on MSVC which apparently has a missed-optimization bug with 64-bit types for bitfield members.
;; MSVC -O2 with plain unsigned (or uint32_t) bitfield members
int Method1(unsigned __int64) PROC ; Method1, COMDAT
mov eax, DWORD PTR _Data$[esp]
test eax, 33554432 ; 02000000H
je SHORT $LN2@Method1
test eax, 536870912 ; 20000000H
je SHORT $LN2@Method1
mov eax, 1
ret 0
$LN2@Method1:
xor eax, eax
ret 0
int Method1(unsigned __int64) ENDP ; Method1
I might have implemented it branchlessly like ((high >> 25) & (high >> 29)) & 1
with 2 shr
instructions and 2 and
instructions (and a mov
). If it's really predictable, though, branching is reasonable and breaks the data dependency. clang does a nice job here, though, using not
+ test
to test both bits at once. (And setcc
to get the result as an integer again). This has better latency than my idea, especially on CPUs without mov-elimination. clang doesn't have a missed optimization for bitfields based on 64-bit types, either. We get the same code either way.
# clang7.0 -O3 -m32 regardless of bitfield member type
Method1(unsigned long long): # @Method1(unsigned long long)
mov ecx, dword ptr [esp + 8]
xor eax, eax # prepare for setcc
not ecx
test ecx, 570425344 # 0x22000000
sete al
ret
UEFI coding standard:
The EDK II coding standard 5.6.3.4 Bit Fields says that:
- Bit fields may only be of type
INT32
, signed INT32
, UINT32
, or a typedef name defined as one of the three INT32
variants.
I don't know why they make up these "INT32" names when C99 already has perfectly good int32_t
. It's also unclear why they'd place this restriction. Perhaps because of the MSVC missed-optimization bug? Or maybe to aid human programmer comprehension by disallowing some "weird stuff".
gcc and clang don't warn about unsigned long long
as a bitfield type, even in 32-bit mode and with -Wall -Wextra -Wpedantic
, in C or C++ mode. I don't think ISO C or ISO C++ have a problem with it.
Further, Should use of bit-fields of type int be discouraged? points out that plain int
as a bitfield type should be discouraged because the signedness is implementation-defined. And that the ISO C++ standard discusses bitfield types from char
to long long
.
I think your MSVC warning about non-int
bitfields must be from some kind of coding-standard-enforcement package, because normal MSVC on Godbolt doesn't do that even with `-Wall.
warning C4214: nonstandard extension used: bit field types other than int