I feel confuse in page table management in Linux kernel ?
In Linux kernel space, before page table is turned on. Kernel will run in virtual memory with 1-1 mapping mechanism. After page table is turned on, then kernel has consult page tables to translate a virtual address into a physical memory address. Questions are:
At this time, after turning on page table, kernel space is still 1GB (from 0xC0000000 - 0xFFFFFFFF ) ?
And in the page tables of kernel process, only page table entries (PTE) in range from 0xC0000000 - 0xFFFFFFFF are mapped ?. PTEs are out of this range will be not mapped because kernel code never jump there ?
Mapping address before and after turning on page table is same ?
Eg. before turning on page table, the virtual address 0xC00000FF is mapped to physical address 0x000000FF, then after turning on page table, above mapping does not change. virtual address 0xC00000FF is still mapped to physical address 0x000000FF. Different thing is only that after turning on page table, CPU has consult the page table to translate virtual address to physical address which no need to do before.The page table in kernel space is global and will be shared across all process in the system including user process ?
This mechanism is same in x86 32bit and ARM ?
When Linux enables the MMU, it is only required that the virtual address of the kernel space is mapped. This happens very early in booting. At this point, there is no user space. There is no restrictions that the MMU can map multiple virtual addresses to the same physical address. So, when enabling the MMU, it is simplest to have a
virt==phys
mapping for the kernel code space and the mappinglink==phys
or the 0xC0000000 mapping.If the physical code address is Oxff and the final link address is 0xc00000FF, then we have a duplicate mapping when turning on the MMU. Both 0xff and 0xc00000ff map to the same physical page. A simple
jmp
(jump) orb
(branch) will move from one address space to the other. At this point, thevirt==phys
mapping can be removed as we are executing at the final destination address.I think the above should answer points 1 through 3. Basically, the booting page tables are not the final page tables.
Yes, this is a big win with a VIVT cache and for many other reasons.
Of course the underlying mechanics are different. They are different even for different processors within these families; 486 vs P4 vs Amd-K6; ARM926 vs Cortex-A5 vs Cortex-A8, etc. However, the semantics are very similar.
See: Bootmem@lwn.net - An article on the early Linux memory phase.
Depending on the version, different memory pools and page table mappings are active during boot. The mappings we are all familiar with do not need to be in place until
init
runs.The following discussion is based on 32-bit ARM Linux, and version of kernel source code is 3.9
All your questions can be addressed if you go through the procedure of setting up the initial page table(which will be overwitten later by function
paging_init
) and turning on MMU.When kernel is first launched by bootloader, Assembly function
stext
(in arch\arm\kernel\head.s) is the first function to run. Note that MMU has not been turned on yet at this moment.Among other things, the two import jobs done by this function
stext
is:paging_init
)Before delving into the your questions, it is benificial to know:
Now the tricky part:
First trick: using position-independent code. Assembly function stext is linked to address "
PAGE_OFFSET + TEXT_OFFSET
"(0xCxxxxxxx), which is a virtual address, however, since MMU has not been turned on yet, the actual address where assembly function stext is running is "PHYS_OFFSET + TEXT_OFFSET
"(the actual value depends on your actual hardware), which is a physical address.So, here is the thing: the program of function
stext
"thinks" that it is running in address like 0xCxxxxxxx but it is actually running in address (0x00000000 + some_offeset)(say your hardware configures 0x00000000 as the starting point of RAM). So before turning on MMU, the assembly code need to be very carefully written to make sure that nothing goes wrong during the execution procedure. In fact a techinque called position-independent code(PIC) is used.To further explain the above, I extract several assembly code snippets:
Note that the above "ldr" instruction is a pseudo instruction which means "get the (virtual) address of function __mmap_switched and put it into r13"
And function __enable_mmu in turn calls function __turn_mmu_on: (Note that I removed several instructions from function __turn_mmu_on which are essential instructions to the function but not of our interest)
Second trick: identical mapping when setting up initial page table before turning on MMU. More specifically, the same address range where kernel code is running is mapped twice.
Why doing that? Remember that before MMU is turned on, every address issued by CPU is physical address(starting at 0x00000000) and after MMU is turned on, every address issued by CPU is virtual address(starting at 0xC0000000).
Because ARM is a pipeline structure, at the moment MMU is turned on, there are still instructions in ARM's pipeine that are using (physical) addresses that are generated by CPU before MMU is turned on! To avoid these instructions to get blown up, an identical mapping has to be set up to cater them.
Now returning to your questions:
A: I guess you mean turning on MMU. The answer is yes, kernel space is 1GB(actually it also occupies several mega bytes below 0xC0000000, but that is not of our interest)
A: While the answer to this question is quite complicated because it involves lot of details regarding specific kernel configurations.
To fully answer this question, you need to read the part of kernel source code that set up the initial page table(assembly function
__create_page_tables
) and the function which sets up the final page table(C function paging_init).To put it simple, there are two levels of page table in ARM, the first page table is PGD, which occupies 16KB. Kernel first zeros out this PGD during initialization process and does the initial mapping in assembly function
__create_page_tables
. In function__create_page_tables
, only a very small portion of address space is mapped.After that, the final page table is set up in function
paging_init
, and in this function, a quite large portion of address space is mapped. Say if you only have 512M RAM, for most common configurations, this 512M-RAM would be mapping by kernel code section by section(1 section is 1MB). If your RAM is quite large(say 2GB), only a portion of your RAM will be directly mapped. (I will stop here because there are too many details regarding Question 2)A: I think I've already answered this question in my explanation of "Second trick: identical mapping when setting up initial page table before turning on MMU."
A: Yes and no. Yes because all processes share the same copy(content) of kernel page table(higher 1GB part). No because each process uses its own 16KB memory to store the kernel page table(although the content of page table for higher 1GB part is identical for every process).
Different Architectures use different mechanism