Why does the BIOS entry point start with a WBINVD

2020-06-08 19:40发布

问题:

I'm investigating the BIOS code in my machine (x86_64 Linux, IvyBridge). I use the following procedure to dump the BIOS code:

$ sudo cat /proc/iomem | grep ROM
  000f0000-000fffff : System ROM
$ sudo dd if=/dev/mem of=bios.dump bs=1M count=1

Then I use radare2 to read and disassemble the binary dump:

$ r2 -b 16 bios.dump 
[0000:0000]> s 0xffff0
[f000:fff0]> pd 3
        :   f000:fff0      0f09           wbinvd
        `=< f000:fff2      e927f5         jmp 0xff51c
            f000:fff5      0000           add byte [bx + si], al

I know x86 processor initialization always starts with a 16-bit 8086 environment, and the first instruction to be executed is at f000:fff0, i.e. 0xffff0. So I go to that location and disassemble the code.

To my surprise, the first instruction is WBINVD, whose functionality is to invalidate the cache, which seems to be irrelevant when the processor is powered on or reset. I would expect the first instruction to be simply a jmp to a lower memory address.

Why is there a WBINVD before jmp?

I've already searched the relevant portion of the Intel manuals, Volume 3 Chapter 9 Processor Management and Initialization, but it doesn't mention anything about WBINVD. I also searched some online resources but didn't find any explanation.

Edit for more info:

After following the jmp instruction to 0xff51c, the code is more interesting; it's doing a self-check:

[f000:f51c]> pd
            f000:f51c      dbe3           fninit
            f000:f51e      0f6ec0         movd mm0, eax
            f000:f521      6631c0         xor eax, eax
            f000:f524      8ec0           mov es, ax
            f000:f526      8cc8           mov ax, cs
            f000:f528      8ed8           mov ds, ax
            f000:f52a      b800f0         mov ax, 0xf000
            f000:f52d      8ec0           mov es, ax
            f000:f52f      6726a0f0ff00.  mov al, byte es:[0xfff0]     ; [0xfff0:1]=0
            f000:f536      3cea           cmp al, 0xea
        ,=< f000:f538      750f           jne 0xff549
        |   f000:f53a      b91b00         mov cx, 0x1b
        |   f000:f53d      0f32           rdmsr  ; check BSP (Boot Strap Processor) flag, if set, loop back to 0xffff0; otherwise, infinite hlt
        |   f000:f53f      f6c401         test ah, 1
       ,==< f000:f542      7441           je 0xff585
      ,===< f000:f544      eaf0ff00f0     ljmp 0xf000:0xfff0
      ||`-> f000:f549      b001           mov al, 1
      ||    f000:f54b      e680           out 0x80, al
      ||    f000:f54d      66be8cfdffff   mov esi, 0xfffffd8c          ; 4294966668
      ||    f000:f553      662e0f0114     lgdt cs:[si]
      ||    f000:f558      0f20c0         mov eax, cr0
      ||    f000:f55b      6683c803       or eax, 3
      ||    f000:f55f      0f22c0         mov cr0, eax
      ||    f000:f562      0f20e0         mov eax, cr4
      ||    f000:f565      660d00060000   or eax, 0x600
      ||    f000:f56b      0f22e0         mov cr4, eax
      ||    f000:f56e      b81800         mov ax, 0x18
      ||    f000:f571      8ed8           mov ds, ax
      ||    f000:f573      8ec0           mov es, ax
      ||    f000:f575      8ee0           mov fs, ax
      ||    f000:f577      8ee8           mov gs, ax
      ||    f000:f579      8ed0           mov ss, ax
      ||    f000:f57b      66be92fdffff   mov esi, 0xfffffd92          ; 4294966674
      ||    f000:f581      662eff2c       ljmp cs:[si]
      |`.-> f000:f585      fa             cli
      | :   f000:f586      f4             hlt
      | `=< f000:f587      ebfc           jmp 0xff585

To conclude the weirdness, this BIOS code is reading itself at 0xffff0 and comparing the byte with 0xea, which is exactly the opcode of a far jump:

            f000:f52a      b800f0         mov ax, 0xf000
            f000:f52d      8ec0           mov es, ax
            f000:f52f      6726a0f0ff00.  mov al, byte es:[0xfff0]     ; [0xfff0:1]=0
            f000:f536      3cea           cmp al, 0xea

If it finds the code at 0xffff0 is a far jump, then it will go into an infinite loop.

More precisely, the APs (Application Processors) will loop infinitely at the hlt instruction, while the BSP (Boot Strap Processor) will loop back to the beginning 0xffff0. Since the code at 0xffff0 won't be changed, we can conclude the BSP will always find the byte being 0xea and will never go out of the loop.

So what's the purpose of this self-checking? I can hardly believe it's a naive attempt to prevent modification.

回答1:

Albeit hard to reason about, remember that the load mov al, byte es:[0xfff0] is not reading from the the BIOS first instruction, even though es is set to 0xf000.

The first instruction is read from 0xfffffff0, the PCH will also probably alias 0xf0000-0xfffff to 0xffff0000-0xffffffff at reset, so when the BSP is booted it will execute the code you dumped.
IIRC, the APs don't boot unless explicitly waken up.

The BSP will then will proceed with initialising the HW (judging from the dump).
At some point it will set the attribute map for the 0xf0000-0xfffff to steer reads and writes (or just writes and then reads) to memory.
The end result is that when a processor (an HW thread) boots it will execute the code from the flash until it perform a far jump.
At the point the cs base is correctly computed as per real-mode rules (pretty much like the unreal mode) and the instruction will be fetched from the 0xf0000-0xfffff (i.e. from the RAM).
All of this while the cs segment value didn't actually change.

The BSP at some point will start its multiprocessor initialisation routine, where it broadcasts to everyone (including himself) an INIT-SIPI-SIPI that will result in a sleep for the APs and a ljmp 0xf000:0xfff0 for the BSP.
The trick here is that the target of the jump, 0xf000:0xfff0, is not the same bus address of the wbinvd instruction.
There could be something else there, probably another initialisation routine.

At the end of the initialisation the BIOS could simply reset the attributes of the 0xf0000-0xfffff to fall through to the flash (so a software reset is possible), preventing (not intentionally) a dump of the intermediary code.

This is not very efficient, but BIOSes are not usually masterpieces of code.

I don't have enough element to be sure what's going on, my point is that the ljmp 0xf000:0xfff0 and the mov al, byte es:[0xfff0] doesn't have to read from the same region they reside in.
With this in mind, all bets are off.
Only a proper reverse engineering will tell.

Regarding the wbinvd, I suggested in the comment it could be related to the warm boot facility and Peter Cordes suggested that it may specifically have to do with cache-as-RAM.
It makes sense, I guess will never be sure though.
It could as well be a case of cargo cult, where a programmer deemed the instruction necessary based rumors.



回答2:

This is actually the answer to the title question:

Hadi Brais: According to slide 14 of BIOS and System Management Mode Internals, the wbinv instruction was there in UDK2010 but then got later removed in UDK2012. Perhaps it's security-related. I don't know exact what.

I can confirm that this instruction is not present at 0xfffffff0 on my machine from 2017.

There is a more burning question here and that's what does the comparison with comparison with 0xea mean.

Here is my code jumped to by the reset vector at 0xfffffff0:

0x00:  DB E3                      fninit 
0x02:  0F 6E C0                   movd   mm0, eax   //move BIST value to mm0
0x05:  0F 31                      rdtsc  
0x07:  0F 6E EA                   movd   mm5, edx
0x0a:  0F 6E F0                   movd   mm6, eax  //save tsc
0x0d:  66 33 C0                   xor    eax, eax //clear eax

0x10:  8E C0                      mov    es, ax
0x12:  8C C8                      mov    ax, cs
0x14:  8E D8                      mov    ds, ax
0x16:  B8 00 F0                   mov    ax, 0xf000
0x19:  8E C0                      mov    es, ax
0x1b:  67 26 A0 F0 FF 00 00       mov    al, byte ptr es:[0xfff0]
0x22:  3C EA                      cmp    al, 0xea
0x24:  74 0E                      je     0x34   //if ea is at ffff0h then jump to the 0xf000e05b check 

0x26:  BA F9 0C                   mov    dx, 0xcf9
0x29:  EC                         in     al, dx    //read port 0xcf9
0x2a:  3C 04                      cmp    al, 4    
0x2c:  75 25                      jne    0x53      
0x2e:  BA F9 0C                   mov    dx, 0xcf9 //perform hard reset since if CPU only reset is issued not all MSRs are restored to their defaults
0x31:  B0 06                      mov    al, 6
0x33:  EE                         out    dx, al  

0x34:  67 66 26 A1 F1 FF 00 00    mov    eax, dword ptr es:[0xfff1]
0x3c:  66 3D 5B E0 00 F0          cmp    eax, 0xf000e05b
0x42:  75 0F                      jne    0x53      //if it isn't, move to notwarmstart; it's not a warm start because BIOS shadow isn't present

0x44:  B9 1B 00                   mov    cx, 0x1b //if it is equal, read bsp bit from apic_base msr
0x47:  0F 32                      rdmsr  
0x49:  F6 C4 01                   test   ah, 1
0x4c:  74 41                      je     0x8f   //if the and operation with 00000001b produces a zero result i.e. it's an AP then jump to cli, hlt

0x4e:  EA F0 FF 00 F0             ljmp   0xf000:0xfff0 //if it's the BSP and the shadow ROM is present, jump to 0xffff0

notwarmstart:
0x53:  B0 01                      mov    al, 1
0x55:  E6 80                      out    0x80, al  //send 1 as a debug POST code
0x57:  66 BE 68 FF FF FF          mov    esi, 0xffffff68
0x5d:  66 2E 0F 01 14             lgdt   cs:[si] //loads 32&16 GDT pointer (not 16&6, due to 66 prefix) at 16bit address fff68 in si into GDTR (base:ffffff28 limit:003f); will be accessing alias and not shadow ROM

//enter 16 bit protected mode//
0x62:  0F 20 C0                   mov    eax, cr0
0x65:  66 83 C8 03                or     eax, 3   //Set PE bit (bit #0) & MP bit (bit #1)
0x69:  0F 22 C0                   mov    cr0, eax  //Activate protected mode
0x6c:  0F 20 E0                   mov    eax, cr4 
0x6f:  66 0D 00 06 00 00          or     eax, 0x600 //Set OSFXSR bit (bit #9) & OSXMMEXCPT bit (bit #10)
0x75:  0F 22 E0                   mov    cr4, eax

//set up selectors for 32 bit protected mode entry
0x78:  B8 18 00                   mov    ax, 0x18 //segment descriptor at 0x18 in GDT is (raw): 00cf93000000ffff
0x7b:  8E D8                      mov    ds, ax
0x7d:  8E C0                      mov    es, ax
0x7f:  8E E0                      mov    fs, ax
0x81:  8E E8                      mov    gs, ax
0x83:  8E D0                      mov    ss, ax
0x85:  66 BE 6E FF FF FF          mov    esi, 0xffffff6e
0x8b:  66 2E FF 2C                ljmp   cs:[si]   //transition to flat 32 bit protected mode and jump to address at 0x0:0xffffff6e aka. 0xffffff6e which is fffffcd8. CS contains 0 remember (it's the base that is 0xffff) so it will load the first entry.
                                                   //PEI begins at that address

0x8f:  FA                         cli    
0x90:  F4                         hlt    
.
.

We notice that my code differs from yours. There is an extra comparison to 0xf000e05b and a read/write to 0xcf9.

A clue here in the edk2 source code is that the code being jumped to is called 'NotWarmStart'. The code speaks for itself. The key to solving this is by analysing the 3 different implementations carefully (+ your observations from UEFI legacy boot vs UEFI boot).

In mine, if EA is at FFFF0h then it checks FFFF1h for 0xf000e05b. If 0xf000e05b is there then it checks for the BSP flag, and if its the BSP, it jumps to FFFF0h. If 0xf000e05b isn't there, it jumps to the 16 bit + 32 bit protected mode setup (called 'NotWarmStart), which then jumps to 32 bit flat protected mode (edk2 calls this PEI, but I'd say PEI classically begins at the PEI core and that the code it jumps to is actually still SEC, given that it uses FSP to set up CAR, optionally perform microcode updates if BootGuard isn't present and passes control to the PEI core) implementation at 0x18:0xffffff6e. If EA is not present, it checks bit 3 of 0xcf9 for 'Check INIT# is asserted'. If it is asserted then it performs a hard reset, writing 0x6 which results in a PLTRST#, reason 'issue warm start, since if CPU only reset is issued not all MSRs are restored to their defaults'. If it isn't asserted then it jumps to 'NotWarmStart'.

There are 2 suggestions at play for the reason 0xffff0 is proven to contain a different value to 0xfffffff0 at reset. 1) RAM contains data and PAMs are steering the 0xfffff range to RAM rather than SPI ROM. RAM would only contain data if some kind of soft reset occurred, like INIT#, where RAM is unaffected. 2) UEFI legacy boot causes Intel ME to set PAMs to default to RAM rather than SPI ROM / disables BIOS decode enable bit BIOS_LEGACY_F_EN on LPC or SPI Bridge (which seems a bit unlikely and elaborate to me, and I feel like the default values will hold true at the reset vector).

At runtime, your dump shows identical code at 0xffff0 and 0xfffffff0 for UEFI boot but different code for UEFI legacy boot. It looks to me like in UEFI mode, there is no shadow ROM in RAM at 0xffff0. You're probably directly accessing the SPI ROM because there's no reason for that range to be touched (legacy option ROMs aren't required and I've got legacy option ROMs shadowed in my UEFI legacy boot system. In UEFI mode, there will be DXE drivers present in the XROMBAR space that will be used instead).

Just looking at your code it's easy to say: the check for 0xea is saying 'if 0xea isn't there then it's a UEFI boot, so jump to 32 bit SEC and determine whether warm or not later'. 'if 0xea is there then it's a warm start and the previous boot was a legacy boot, so jump to the shorthand implementation at 0xffff0'.

The problem is, my code reveals the 3rd option, and it has to be there for a reason. 0xffff0 can be in 3 different states. Not containing 0xea (jumps to 32 bit SEC); containing 0xea and 0xf000e05b (if BSP, jumps to 0xffff0 otherwise hlt); containing 0xea and not 0xf000e05b (jumps to 32 bit SEC).

My guess is that containing 0xea and 0xf000e05b means it is a legacy boot and a warm start. Containing 0xea and not 0xf000e05b means it is a warm UEFI boot. Not containing 0xea means the RAM contains nothing useful in either mode and if it's actually a warm boot then it needs issue a PLTRST# if the RAM doesn't contain anything useful, . That's sort of the only option remaining. That leaves me to theorise that seeing as that 3rd check doesn't occur on your UEFI BIOS, you see identical code in UEFI mode, whereas if I were to boot into UEFI mode, I reckon I'd see different code at 0xffff0 to 0xfffffff0, but a different code to what it would be if I were in UEFI legacy boot. This is possibly some 16 bit shorthand for UEFI warm boot in shadowed RAM, which is still present after a warm boot, and UEFI will detect and jump to it / use this data later on. On your system, this shadow RAM at the location is not being used and is being directed to SPI ROM instead. Maybe yours implements it differently and shadows to a different region of the 1MiB space and uses a different PAM, and it detects it later on (and therefore doesn't need to clarify the 0xea with an extra step); it may assume UEFI shadow in the 700MiB range is corrupt (because the OS could overwrite it but some of it remains resident; I'm not sure what the policy is on this). The 1MiB range may be the only safe place to shadow warm start data and it can't shadow to 0xff000000–0xffffffff as that range can only ever be decoded to DMI and in the RAM is typically memory reclaim from elsewhere. If is assumes the OS doesn't overwrite the UEFI data in RAM, then your shadow might not be in the lower 1MiB at all, and the check further in may be checking the 700MiB region for the warm start implementation. The warm start implementation will assume services are loaded and devices are already enumerated and will let you select a new boot device if you want.

The reason why edk2 calls the routine 'NotWarmStart' even though it doesn't check RAM / support warm start like our implementations, is because I'd imagine that 0xcf9 tells the processor if a warm boot / soft reset has occurred on the system (I.e. an INIT# packet has been sent to the processor: bit3 is high but bit2 is low, and the code currently executing is implicitly on the processor that was INITed; I can only assume this bit goes low after a reset only by using PLTRST# or writing 0 to it), therefore it can still tell that it is a warm start but it needs to (whether the RAM contains useful data or not) perform a PLTRST#, because the warm start system state will never be made use of.

Also there is no loop at hlt. Hlt enters a HALT state, and responds to an INIT# IPI to put it in a wait-for-SIPI state. Execution will then begin at whatever address the BSP selects for the AP.