I read the article about the Meltdown/Spectre exploit that allow reading privileged data from the kernel using hardware bugs in the CPU. It says:
The trick is to line up instructions in a normal user process that cause the processor to speculatively fetch data from protected kernel memory before performing any security checks. The crucial Meltdown-exploiting x86-64 code can be as simple as...
; rcx = kernel address ; rbx = probe array retry: mov al, byte [rcx] shl rax, 0xc jz retry mov rbx, qword [rbx + rax]
Trying to fetch a byte from the kernel address as a user process triggers an exception – but the subsequent instructions have already been speculatively executed out of order, and touch a cache line based on the content of that fetched byte.
An exception is raised, and handled non-fatally elsewhere, while the out-of-order instructions have already acted on the content of the byte. Doing some Flush+Reload magic on the cache reveals which cache line was touched and thus the content of the kernel memory byte. Repeat this over and over, and eventually you dump the contents of kernel memory.
Can someone explain how is this Flush+Reload magic is done and how can it reveal the touched cache line?
// Further down, there is pseudocode in C# that shows the complete process.
We have a kernel address
rcx
which is the address of one byte (let's call the value of that byte "X") in kernel memory space that we want to leak. The currently running user process is not allowed to access this address. An exception will be thrown when doing so.We have the probe array with the size 256 * 4096 bytes in user space which we can freely access. So, this is just some normal array which is exactly 256 pages long. The size of one page is 4096 bytes.
First, a flush operation is executed (First part of "Flush+Reload"). This tells the processor to completely clear the L1 cache. So, no memory page is cached in the L1 cache. (We don't see that in the code in the OP)
Then we execute the code mentioned in the OP.
We read the byte value X at the kernel address that we want to leak and store it in the rax register. This instruction will trigger an exception because we are not allowed to access this memory address from user level code.
However, because the test whether we are allowed to access this address takes some time, the processor will already start executing the following statements. So, we have the byte value X that we want to know stored in the rax register for these statements.
We multiply this secret value X with 4096 (The page size).
Now we add the calculated value in the rax register to the start of our probe array and get an address that points into the Xth page within the memory space that makes up our probe array.
Then we access the data at that address, which means that the Xth page of the probe array is loaded into the L1 cache.
Now, the L1 cache is empty (because we have cleared it before explicitly) except for two pages that are in the cache:
Now, the second part of "Flush+Reload" begins. One after the other, we read each page in the probe array, measuring the time that takes. So, altogether, we load 256 pages. 255 of these page loads will be rather slow (because the associated memory is not in the L1 cache yet), but one load (that of the Xth page) will be quite fast (because it was in the L1 cache before).
Now, because we find that loading the Xth page was fastest, we know that X is the value that is at the kernel address that we wanted to leak.
From the meltdown paper, this is the graphic showing the time measurements of loading the pages within the probe array:
In this case, X was 84.
Pseudocode in C# that shows the complete process: