How does Linux support more than 512GB of virtual

2019-03-30 02:03发布

问题:

The user virtual address space for x86-64 with Linux is 47 bit long. Which essentially means that Linux can map a process with around ~128 TB virtual address range.

However, what confuses me that x86-64 architecture supports ISA defined 4-level hierarchical page table (arranged as radix-tree) for each process. The root of the page table can only map up to 512 GB of contiguous virtual address space. So how Linux can support more than 512GB of virtual address range? Does it uses multiple page tables for each process? If yes, then for a process what should the CR3 (x86-64's register to contain the address of the base of the page table) contain for any given process? Am I missing something?

回答1:

The root of the page table can only map up to 512 GB of contiguous virtual address space. So how Linux can support more than 512GB of virtual address range? Does it uses multiple page tables for each process? If yes, then for a process what should the CR3 (x86-64's register to contain the address of the base of the page table) contain for any given process? Am I missing something?

I don't know what do you mean by "root of the page table", but paging on x86-64 looks like this:

  • Page tables - the lowest level of paging structures. Each has 512 8-byte entries (PTE) describing one 4 KiB page, so PT describes 512 * 4 KiB = 2 MiB of memory (it can also work as 2 MiB page, but let's leave it for now).
  • Page directories - table, similar to PT, containing 512 8-byte entries (PDE) pointing to PTs; so, PD describes 512 * 2 MiB = 1 GiB of memory (it can also work as 1 GiB page, similary to PT).
  • Page directory page table - similar to PD, but contains 512 8-byte entries (PDPTE) pointing to PDs; so, PDPTE describes 512 * 1 Gib = 512 GiB of memory.
  • PML4, the highest level of paging structures, is table containing 512 8-byte entries (PML4E) pointing to PDPTs; so, PML4 describes 512 * 512 GiB = 256 TiB of memory.

I don't know exact memory map of Linux, but probably the higher half (from -128 TiB to 0 - from 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF) is reserved for kernel, lower half (from 0 to 128 TiB - from 0x0000000000000000 to 0x00007FFFFFFFFFFF) is for userspace applications. So, Linux supports 512 times the 512 GiB of virtual address range you are asking; even Torvalds wouldn't say "we won't support PML4". I don't know what confuses you - is it the fact you missed the part saying that page table maps 2 MiB and you've taken it as it maps one page - 4 KiB - but if there is anything I could clarify, ask about it.



回答2:

Typically process address spaces aren't shared, which means, the involved page tables aren't shared between distinct processes either. And that means at all 4 table levels.

Of course, the common (kernel) part is always present in all address spaces, so, in fact, there's some sharing, but the memory there is only accessible to the kernel itself.

Other than that, indeed, every process has its own page tables pretty much and there isn't any problem with using all 248 addresses in any one of them. At least, there's no special limitation on the part of the CPU, although there can be on the part of the OS.