I'm investigating the BIOS code in my machine (x86_64 Linux, IvyBridge). I use the following procedure to dump the BIOS code:
$ sudo cat /proc/iomem | grep ROM
000f0000-000fffff : System ROM
$ sudo dd if=/dev/mem of=bios.dump bs=1M count=1
Then I use radare2
to read and disassemble the binary dump:
$ r2 -b 16 bios.dump
[0000:0000]> s 0xffff0
[f000:fff0]> pd 3
: f000:fff0 0f09 wbinvd
`=< f000:fff2 e927f5 jmp 0xff51c
f000:fff5 0000 add byte [bx + si], al
I know x86 processor initialization always starts with a 16-bit 8086 environment, and the first instruction to be executed is at f000:fff0
, i.e. 0xffff0
. So I go to that location and disassemble the code.
To my surprise, the first instruction is WBINVD
, whose functionality is to invalidate the cache, which seems to be irrelevant when the processor is powered on or reset. I would expect the first instruction to be simply a jmp
to a lower memory address.
Why is there a WBINVD
before jmp
?
I've already searched the relevant portion of the Intel manuals, Volume 3 Chapter 9 Processor Management and Initialization, but it doesn't mention anything about WBINVD
. I also searched some online resources but didn't find any explanation.
Edit for more info:
After following the jmp
instruction to 0xff51c
, the code is more interesting; it's doing a self-check:
[f000:f51c]> pd
f000:f51c dbe3 fninit
f000:f51e 0f6ec0 movd mm0, eax
f000:f521 6631c0 xor eax, eax
f000:f524 8ec0 mov es, ax
f000:f526 8cc8 mov ax, cs
f000:f528 8ed8 mov ds, ax
f000:f52a b800f0 mov ax, 0xf000
f000:f52d 8ec0 mov es, ax
f000:f52f 6726a0f0ff00. mov al, byte es:[0xfff0] ; [0xfff0:1]=0
f000:f536 3cea cmp al, 0xea
,=< f000:f538 750f jne 0xff549
| f000:f53a b91b00 mov cx, 0x1b
| f000:f53d 0f32 rdmsr ; check BSP (Boot Strap Processor) flag, if set, loop back to 0xffff0; otherwise, infinite hlt
| f000:f53f f6c401 test ah, 1
,==< f000:f542 7441 je 0xff585
,===< f000:f544 eaf0ff00f0 ljmp 0xf000:0xfff0
||`-> f000:f549 b001 mov al, 1
|| f000:f54b e680 out 0x80, al
|| f000:f54d 66be8cfdffff mov esi, 0xfffffd8c ; 4294966668
|| f000:f553 662e0f0114 lgdt cs:[si]
|| f000:f558 0f20c0 mov eax, cr0
|| f000:f55b 6683c803 or eax, 3
|| f000:f55f 0f22c0 mov cr0, eax
|| f000:f562 0f20e0 mov eax, cr4
|| f000:f565 660d00060000 or eax, 0x600
|| f000:f56b 0f22e0 mov cr4, eax
|| f000:f56e b81800 mov ax, 0x18
|| f000:f571 8ed8 mov ds, ax
|| f000:f573 8ec0 mov es, ax
|| f000:f575 8ee0 mov fs, ax
|| f000:f577 8ee8 mov gs, ax
|| f000:f579 8ed0 mov ss, ax
|| f000:f57b 66be92fdffff mov esi, 0xfffffd92 ; 4294966674
|| f000:f581 662eff2c ljmp cs:[si]
|`.-> f000:f585 fa cli
| : f000:f586 f4 hlt
| `=< f000:f587 ebfc jmp 0xff585
To conclude the weirdness, this BIOS code is reading itself at 0xffff0
and comparing the byte with 0xea
, which is exactly the opcode of a far jump:
f000:f52a b800f0 mov ax, 0xf000
f000:f52d 8ec0 mov es, ax
f000:f52f 6726a0f0ff00. mov al, byte es:[0xfff0] ; [0xfff0:1]=0
f000:f536 3cea cmp al, 0xea
If it finds the code at 0xffff0
is a far jump, then it will go into an infinite loop.
More precisely, the APs (Application Processors) will loop infinitely at the hlt
instruction, while the BSP (Boot Strap Processor) will loop back to the beginning 0xffff0
. Since the code at 0xffff0
won't be changed, we can conclude the BSP will always find the byte being 0xea
and will never go out of the loop.
So what's the purpose of this self-checking? I can hardly believe it's a naive attempt to prevent modification.
Albeit hard to reason about, remember that the load mov al, byte es:[0xfff0]
is not reading from the the BIOS first instruction, even though es
is set to 0xf000
.
The first instruction is read from 0xfffffff0
, the PCH will also probably alias 0xf0000-0xfffff
to 0xffff0000-0xffffffff
at reset, so when the BSP is booted it will execute the code you dumped.
IIRC, the APs don't boot unless explicitly waken up.
The BSP will then will proceed with initialising the HW (judging from the dump).
At some point it will set the attribute map for the 0xf0000-0xfffff
to steer reads and writes (or just writes and then reads) to memory.
The end result is that when a processor (an HW thread) boots it will execute the code from the flash until it perform a far jump.
At the point the cs
base is correctly computed as per real-mode rules (pretty much like the unreal mode) and the instruction will be fetched from the 0xf0000-0xfffff
(i.e. from the RAM).
All of this while the cs
segment value didn't actually change.
The BSP at some point will start its multiprocessor initialisation routine, where it broadcasts to everyone (including himself) an INIT-SIPI-SIPI that will result in a sleep for the APs and a ljmp 0xf000:0xfff0
for the BSP.
The trick here is that the target of the jump, 0xf000:0xfff0
, is not the same bus address of the wbinvd
instruction.
There could be something else there, probably another initialisation routine.
At the end of the initialisation the BIOS could simply reset the attributes of the 0xf0000-0xfffff
to fall through to the flash (so a software reset is possible), preventing (not intentionally) a dump of the intermediary code.
This is not very efficient, but BIOSes are not usually masterpieces of code.
I don't have enough element to be sure what's going on, my point is that the ljmp 0xf000:0xfff0
and the mov al, byte es:[0xfff0]
doesn't have to read from the same region they reside in.
With this in mind, all bets are off.
Only a proper reverse engineering will tell.
Regarding the wbinvd
, I suggested in the comment it could be related to the warm boot facility and Peter Cordes suggested that it may specifically have to do with cache-as-RAM.
It makes sense, I guess will never be sure though.
It could as well be a case of cargo cult, where a programmer deemed the instruction necessary based rumors.
This is actually the answer to the title question:
Hadi Brais: According to slide 14 of BIOS and System Management Mode Internals, the wbinv instruction was there in UDK2010 but then got later removed in UDK2012. Perhaps it's security-related. I don't know exact what.
I can confirm that this instruction is not present at 0xfffffff0 on my machine from 2017.
There is a more burning question here and that's what does the comparison with comparison with 0xea mean.
Here is my code jumped to by the reset vector at 0xfffffff0:
0x00: DB E3 fninit
0x02: 0F 6E C0 movd mm0, eax //move BIST value to mm0
0x05: 0F 31 rdtsc
0x07: 0F 6E EA movd mm5, edx
0x0a: 0F 6E F0 movd mm6, eax //save tsc
0x0d: 66 33 C0 xor eax, eax //clear eax
0x10: 8E C0 mov es, ax
0x12: 8C C8 mov ax, cs
0x14: 8E D8 mov ds, ax
0x16: B8 00 F0 mov ax, 0xf000
0x19: 8E C0 mov es, ax
0x1b: 67 26 A0 F0 FF 00 00 mov al, byte ptr es:[0xfff0]
0x22: 3C EA cmp al, 0xea
0x24: 74 0E je 0x34 //if ea is at ffff0h then jump to the 0xf000e05b check
0x26: BA F9 0C mov dx, 0xcf9
0x29: EC in al, dx //read port 0xcf9
0x2a: 3C 04 cmp al, 4
0x2c: 75 25 jne 0x53
0x2e: BA F9 0C mov dx, 0xcf9 //perform hard reset since if CPU only reset is issued not all MSRs are restored to their defaults
0x31: B0 06 mov al, 6
0x33: EE out dx, al
0x34: 67 66 26 A1 F1 FF 00 00 mov eax, dword ptr es:[0xfff1]
0x3c: 66 3D 5B E0 00 F0 cmp eax, 0xf000e05b
0x42: 75 0F jne 0x53 //if it isn't, move to notwarmstart; it's not a warm start because BIOS shadow isn't present
0x44: B9 1B 00 mov cx, 0x1b //if it is equal, read bsp bit from apic_base msr
0x47: 0F 32 rdmsr
0x49: F6 C4 01 test ah, 1
0x4c: 74 41 je 0x8f //if the and operation with 00000001b produces a zero result i.e. it's an AP then jump to cli, hlt
0x4e: EA F0 FF 00 F0 ljmp 0xf000:0xfff0 //if it's the BSP and the shadow ROM is present, jump to 0xffff0
notwarmstart:
0x53: B0 01 mov al, 1
0x55: E6 80 out 0x80, al //send 1 as a debug POST code
0x57: 66 BE 68 FF FF FF mov esi, 0xffffff68
0x5d: 66 2E 0F 01 14 lgdt cs:[si] //loads 32&16 GDT pointer (not 16&6, due to 66 prefix) at 16bit address fff68 in si into GDTR (base:ffffff28 limit:003f); will be accessing alias and not shadow ROM
//enter 16 bit protected mode//
0x62: 0F 20 C0 mov eax, cr0
0x65: 66 83 C8 03 or eax, 3 //Set PE bit (bit #0) & MP bit (bit #1)
0x69: 0F 22 C0 mov cr0, eax //Activate protected mode
0x6c: 0F 20 E0 mov eax, cr4
0x6f: 66 0D 00 06 00 00 or eax, 0x600 //Set OSFXSR bit (bit #9) & OSXMMEXCPT bit (bit #10)
0x75: 0F 22 E0 mov cr4, eax
//set up selectors for 32 bit protected mode entry
0x78: B8 18 00 mov ax, 0x18 //segment descriptor at 0x18 in GDT is (raw): 00cf93000000ffff
0x7b: 8E D8 mov ds, ax
0x7d: 8E C0 mov es, ax
0x7f: 8E E0 mov fs, ax
0x81: 8E E8 mov gs, ax
0x83: 8E D0 mov ss, ax
0x85: 66 BE 6E FF FF FF mov esi, 0xffffff6e
0x8b: 66 2E FF 2C ljmp cs:[si] //transition to flat 32 bit protected mode and jump to address at 0x0:0xffffff6e aka. 0xffffff6e which is fffffcd8. CS contains 0 remember (it's the base that is 0xffff) so it will load the first entry.
//PEI begins at that address
0x8f: FA cli
0x90: F4 hlt
.
.
We notice that my code differs from yours. There is an extra comparison to 0xf000e05b and a read/write to 0xcf9.
A clue here in the edk2 source code is that the code being jumped to is called 'NotWarmStart'. The code speaks for itself. The key to solving this is by analysing the 3 different implementations carefully (+ your observations from UEFI legacy boot vs UEFI boot).
In mine, if EA is at FFFF0h then it checks FFFF1h for 0xf000e05b. If 0xf000e05b is there then it checks for the BSP flag, and if its the BSP, it jumps to FFFF0h. If 0xf000e05b isn't there, it jumps to the 16 bit + 32 bit protected mode setup (called 'NotWarmStart), which then jumps to 32 bit flat protected mode (edk2 calls this PEI, but I'd say PEI classically begins at the PEI core and that the code it jumps to is actually still SEC, given that it uses FSP to set up CAR, optionally perform microcode updates if BootGuard isn't present and passes control to the PEI core) implementation at 0x18:0xffffff6e. If EA is not present, it checks bit 3 of 0xcf9 for 'Check INIT# is asserted'. If it is asserted then it performs a hard reset, writing 0x6 which results in a PLTRST#, reason 'issue warm start, since if CPU only reset is issued not all MSRs are restored to their defaults'. If it isn't asserted then it jumps to 'NotWarmStart'.
There are 2 suggestions at play for the reason 0xffff0 is proven to contain a different value to 0xfffffff0 at reset. 1) RAM contains data and PAMs are steering the 0xfffff range to RAM rather than SPI ROM. RAM would only contain data if some kind of soft reset occurred, like INIT#, where RAM is unaffected. 2) UEFI legacy boot causes Intel ME to set PAMs to default to RAM rather than SPI ROM / disables BIOS decode enable bit BIOS_LEGACY_F_EN on LPC or SPI Bridge (which seems a bit unlikely and elaborate to me, and I feel like the default values will hold true at the reset vector).
At runtime, your dump shows identical code at 0xffff0 and 0xfffffff0 for UEFI boot but different code for UEFI legacy boot. It looks to me like in UEFI mode, there is no shadow ROM in RAM at 0xffff0. You're probably directly accessing the SPI ROM because there's no reason for that range to be touched (legacy option ROMs aren't required and I've got legacy option ROMs shadowed in my UEFI legacy boot system. In UEFI mode, there will be DXE drivers present in the XROMBAR space that will be used instead).
Just looking at your code it's easy to say: the check for 0xea is saying 'if 0xea isn't there then it's a UEFI boot, so jump to 32 bit SEC and determine whether warm or not later'. 'if 0xea is there then it's a warm start and the previous boot was a legacy boot, so jump to the shorthand implementation at 0xffff0'.
The problem is, my code reveals the 3rd option, and it has to be there for a reason. 0xffff0 can be in 3 different states. Not containing 0xea (jumps to 32 bit SEC); containing 0xea and 0xf000e05b (if BSP, jumps to 0xffff0 otherwise hlt); containing 0xea and not 0xf000e05b (jumps to 32 bit SEC).
My guess is that containing 0xea and 0xf000e05b means it is a legacy boot and a warm start. Containing 0xea and not 0xf000e05b means it is a warm UEFI boot. Not containing 0xea means the RAM contains nothing useful in either mode and if it's actually a warm boot then it needs issue a PLTRST# if the RAM doesn't contain anything useful, . That's sort of the only option remaining. That leaves me to theorise that seeing as that 3rd check doesn't occur on your UEFI BIOS, you see identical code in UEFI mode, whereas if I were to boot into UEFI mode, I reckon I'd see different code at 0xffff0 to 0xfffffff0, but a different code to what it would be if I were in UEFI legacy boot. This is possibly some 16 bit shorthand for UEFI warm boot in shadowed RAM, which is still present after a warm boot, and UEFI will detect and jump to it / use this data later on. On your system, this shadow RAM at the location is not being used and is being directed to SPI ROM instead. Maybe yours implements it differently and shadows to a different region of the 1MiB space and uses a different PAM, and it detects it later on (and therefore doesn't need to clarify the 0xea with an extra step); it may assume UEFI shadow in the 700MiB range is corrupt (because the OS could overwrite it but some of it remains resident; I'm not sure what the policy is on this). The 1MiB range may be the only safe place to shadow warm start data and it can't shadow to 0xff000000–0xffffffff as that range can only ever be decoded to DMI and in the RAM is typically memory reclaim from elsewhere. If is assumes the OS doesn't overwrite the UEFI data in RAM, then your shadow might not be in the lower 1MiB at all, and the check further in may be checking the 700MiB region for the warm start implementation. The warm start implementation will assume services are loaded and devices are already enumerated and will let you select a new boot device if you want.
The reason why edk2 calls the routine 'NotWarmStart' even though it doesn't check RAM / support warm start like our implementations, is because I'd imagine that 0xcf9 tells the processor if a warm boot / soft reset has occurred on the system (I.e. an INIT# packet has been sent to the processor: bit3 is high but bit2 is low, and the code currently executing is implicitly on the processor that was INITed; I can only assume this bit goes low after a reset only by using PLTRST# or writing 0 to it), therefore it can still tell that it is a warm start but it needs to (whether the RAM contains useful data or not) perform a PLTRST#, because the warm start system state will never be made use of.
Also there is no loop at hlt. Hlt enters a HALT state, and responds to an INIT# IPI to put it in a wait-for-SIPI state. Execution will then begin at whatever address the BSP selects for the AP.