Why is the page size of Linux (x86) 4 KB, how is t

2019-01-17 03:20发布

问题:

The default memory page size of the Linux kernel on x86 architecture was 4 KB, I wonder how was that calculated, and why ?

回答1:

The default page size is fixed by what the MMU (memory management unit) of the CPU supports. In 32-bit protected mode x86 supports two kinds of pages:

  • normal ones, 4 KiB
  • huge ones, 4 MiB

Not all x86 processors support large pages. One needs to have a CPU with Page Size Extension (PSE) capabilities. This excludes pre-Pentium processors. Virtually all current-generation x86 CPUs implements it.

4 KiB is widely popuplar page granularity in other architectures too. One could argue that this size comes from the division of a 32-bit virutal address into two 10-bit indexes in page directories/tables and the remaining 12 bits give the 4 KiB page size.



回答2:

The design of 4KB normal page size of 32-bit architecture is actually very interesting :)

And I'd like to add an extra answer to demonstrate why it is reasonable.

x86 uses '2 level page table' to translate virtual memory addresses into physical memory addresses.

So assume that both page directory and page table contain entries, and the page size is bytes. To make full use of the address, we have:

Each entry in page directory/table consumes 4 bytes (32-bit), therefore:

Thus y = 12, and page size in bytes will be = = 4KB.


And what about '1 level page table'? This is interesting because logically we can use a single page table for address lookup.

Assume that the page directory contains entries, each mapping an address to corresponding page, and page size is bytes.

Again, to make full use of addresses, we need:

And:

We get y = 17, and page size is = = 128KB.


We might also argue that, in '2 level page table' version, page directory and page table might have different sizes. However, this means we'll be using larger page directory, which will occupy more than one memory page. Sadly, every time a new user process is spawned, for its own page directory, OS has to allocate successive pages, which is not elegant by design.



回答3:

That depends on the processor architecture.

The default page size is 4 KB on many architectures. It can generally be increased (sometimes a lot, such as AMD64's 1 GB) by switching to huge page mode. This allows the page table to be smaller, which can result in performance improvements.



回答4:

Introduction

The first Intel processor that supported the paging virtual memory technique was the Intel 80386. The processor supported a single page size, 4 KB. Since it was released in 1985, we have to go back to that period of time to understand why Intel chose that particular page size.

Atlas was the first computer to support paging with a page size of 3 KB and it had a profound influence on the design of virtual memory and motivated related research. The system was designed between 1958-1962. It's interesting to note that the page size supported by the 80386 is somewhat close to the page size supported by the Atlas, even though the 80386 was designed about 20 years later and computers (and the workloads they ran) have evolved radically during that period of time! In fact, many computers from that period used page sizes that range between 0.5-5 KB. Researchers at that time actually spent a substantial amount of effort studying virtual memory systems (paging and segmentation).

One of the big questions was: What is the optimal page size? A large number of works were published in the 60s and 70s that attempt to study and understand the impact of the page size on the performance of applications and recommend or provide guidelines on how to choose a page size. There are certainly a number of papers that were never published. As far as I know, this includes the document from Intel that says "... Therefore, the page size should be 4 KB." But the factors that influence or interact with the page size and the process of choosing a page size (or multiple page sizes for that matter) are well known, which is what I'll discuss in this answer at a basic level. I'll also explain in particular why a page size of 4 KB is reasonable.

The Page Size Problem

In the paging method, physical memory is organized as a sequence of contiguous regions of memory, called page frames, that are of the same size (which is the defining characteristic of paging1). Each page frame can be mapped to equally-sized chunk of a virtual address space, called a virtual page.

Suppose that a page consists of N bytes2 (which implies that a page frame is also N bytes in size by definition) and consider a virtual address space that consists of P pages (i.e., the page numbers are {0, 1, 2, ..., P - 1} and the total number of virtual addresses is N * P). Suppose also that the physical address space consists of F page frames (i.e., the page frame numbers are {0, 1, 2, ..., F - 1} and the total number of physical addresses is N * F).

Given a virtual address VA, a mechanism (a mapping device) is required to determine the physical address, PA, it is mapped to or a page fault should be raised in case it was not mapped. The mapping device uses a data structure (the page table) stored somewhere to perform the mapping. There should be an entry in that table for each allocated virtual page that describes how the page is mapped and potentially some additional attributes (such as protection attributes). The design of the page table entry, as you'll see, interacts with the page size. I'll discuss the design of the page table entry in the Intel 80386 later.

The size of a virtual address is log2(N * P) and the size of a physical address is log2(N * F). Some bits of VA would represent the offset within the page while the other bits would represent the page number, which identifies the page using the mapping device.

How many options do we have for the page size? Well, it can be as a little as one byte up to N * P or N * F, whichever is smaller. That's a lot of options.

It is Convenient for The Page Size to be a Power of 2

A virtual address, VA, is equivalent to a pair of page number and offset, (PN, OFF). The translation process should be as efficient as possible. It's convenient for programmers3 to have the bytes within a page to be contiguous in the address space. In this way, the addresses of the items within a multi-byte data structure can be calculated with simple arithmetic on a single address, which would constitute the base address of the data structure. This can be achieved by using the least significant log2(N) bits (rounded up) of a virtual address to represent the offset and the rest of the bits to represent the page number.

If N is not power of 2, there will be some bits that are shared between the offset and page number, depending on the values of these bits. By making N a power of 2, such complication does not exist. So it would neat to use page sizes that are a power of 2. All real processors that support paging use page sizes that are power of two (although the unit of addressability may not be 8 bits), which makes sense. But to be honest, it's not clear whether this really matters. Using today's technology, whether N is a power of 2 may not have any measurable impact on performance or any other metric of interest. In fact, in the future as larger and larger page sizes are needed, it may happen that some page size that is not a power of 2 is better. But so far, this has not happened. The point I'm trying to make here is that the design option of making the page size not a power 2 should not be automatically dismissed. I believe this is a good opportunity of research in future virtual memory systems.

Anyway, keeping in mind that the choice of 4 KB pages was made in the 80s, such extremely small variations in page sizes were shown (both theoretically and experimentally) to be of little importance. So the convenience of power-of-2 page sizes triumphed. This reduces the number of page sizes to consider exponentially. But we still have a good range of options.

Favoring Smaller Page Sizes

Since the mapping device works at the level of pages, the unit of allocation from the perspective of the operating system is a virtual page4. Even if an application needs to only allocate 1 byte, it still has to tell the OS to allocate a whole virtual page for that 1 byte. This issue is called internal fragmentation. Each application (or maybe even each component of an application) has its own virtual address space from which it allocates memory in page-size chunks. It's expected from each application to not use a single page for a single object that it allocates, but rather allocate as many objects as possible from the same page before it allocates more pages. However, because page attributes work at the level of pages, the same application may use multiple user-mode memory managers (such as when using multiple C/C++ runtimes), and it's difficult for application to share parts of the pages they are not using with other applications, internal fragmentation can occur in many pages in the system. Using smaller page sizes can help reduce the amount of wasted physical (for the whole system) and virtual (per process) memory.

In addition, typically applications go through a number of phases throughout their lifetime, where they exhibit different memory requirements in different phases. If the page size is, say, 16 KB, but many applications may require only less 10 KB for many of their phases, then there would be a lot of wasted physical memory, which may lead to out-of-memory situations that could have been avoided if smaller page sizes, such as 8 or 4 KB, were supported.

Smaller page sizes are preferable for handling copy-on-write soft page faults because the smaller the page is, it would take less time to create a copy of it. For extremely small page sizes, this may not make any measurable difference, depending on the memory bus bandwidth.

Typical amounts of physical memory available in computers in the 70s were in the range of 10s or 100s of KBs. It wouldn't make sense to have page sizes of hundreds of KBs or larger. In fact, the working sets of applications at that time were typically only few or tens of KBs. So even page sizes as small as 16 KB are unlikely to be practical because only a couple of pages may consume all of the available physical memory. The page size should be coherent with the amount of physical memory. This argument can be applied to today's systems of course (it wouldn't make sense to have 128-GB pages for example).

So considering working set sizes and physical memory availability of the 70s and early 80s, the page size should be a power of 2 in the range 20-214. Cool, now we have only 15 options to choose from.

Favoring Larger Page Sizes

We can also argue that larger page sizes are better. For the same working set, smaller page sizes imply a larger number of pages per application , which would require page table entries to enable translation. This fundamentally requires larger page tables irrespective of the structure of the page tables (although the exact overhead depends on the design of the page table entry itself, which I'll discuss later). Imagine having for example 4-byte pages and typical working sets of tens of KBs. Then most of the physical memory would actually be allocated to hold the page tables, not the data. Paging out page tables to seconding storage leads to double page faults for individual memory references, which would be absolutely terrible for performance. Such design is obviously ridiculous.

Essentially, the page size shouldn't be (much) smaller than the smallest possible working set size that can possibly ever be. This immediately rules out pages of sizes 20-26, leaving for us 8 options. The papers of the 70s and early 80s that study page sizes mostly study only these 8 options.

There is another reason that makes larger page sizes advantageous. One of the benefits of virtual memory is to be able to transparently use secondary storage in addition to main memory to hold volatile data. However, secondary storage devices are mechanical and perform best with bulk transfers. This is more of a guideline really and we should not rule out any page sizes yet. There are devices with different designs and characteristics and eventually, the performance advantage of bulk transfers will saturate at some point. But it's certainly something to take into account when measuring the impact of page sizes on performance. If the type of applications being considered exhibit little spatial locality, then smaller page sizes would still be preferable because copying extra bytes to or from the disk is not exactly for free.

Spatial locality of reference encourages the use of certain page sizes. For very small page sizes, it's highly likely that all bytes in the page will be used in the a small period of time. As the size of a page gets larger, the number of bytes that are less likely to be used increases. However, for very large pages, all of the working set may fit within a small number of pages irrespective of locality. Therefore, to minimize the number of page faults, the page size should be either too small or too larger, but not in between. But ultimately, this varies from one application to another. Emerging programming paradigms, such as object-oriented programming and functional programming, and techniques such as multithreading may actually reduce spatial locality due to the extensive use of linked structures and due to the way different application interact with each other. Larger page sizes are preferable to reduce the number of page faults. However, smaller page sizes might be better for memory shared between multiple applications to reduce internal fragmentation of the shared pages. It has also been shown experimentally that the number of page frames that can be held in main memory is correlated with the number of page faults, favoring smaller page sizes.

The need for TLBs was well recognized at that time. Intel called them page caches in their patents (not sure whether Intel was first to use that term). Fast TLBs are very important because address translation is on the critical path of executing instructions. To make them as fast as possible, they should be made tiny, but then they can only cache a small number of page table entries. This motivates the use of larger page sizes.

In the search for the optimal page size, it turns out that there isn't one. It was known at that time that there is a need for supporting multiple page sizes. In fact, the Multics system designed in 1965 supported 64- or 1,024-word pages (a word is 36 bits in size). Page sizes in the range 27-214 were shown to be optimal both empirically and theoretically in different scenarios. Intel must have observed that 4-KB pages result in the best average performance for the applications that their customers used in the 80s. For today's computers and applications, such tiny differences in page sizes don't make much difference as it used to be in the 70s and 80s.

The Design of the Page Table Entry of the Intel 80386

The design of the page directory entry and page table entry is discussed in detail in an Intel patent. This is important because the size of the page table entry and the overall structure of the page table were taken into account in many studies about the page size because they all interact with each other. I prefer not to discuss this in more detail to keep the answer short.

The Page Size of the Near Future

Intel has been granted a patent a couple of months ago that apparently proposes a system with a default page size of 64 KB, but at the same time supporting 4 KB pages (and other IA-32e page sizes) for backward compatibility. I quote from the patent:

As a result of support of the mapping of the 64 KB page into 4 KB subpages, VA64 directly supports all currently defined operations on 4 KB pages, including independent protection bits per 4 KB page and arbitrary 4 KB-aligned address mappings. VA64 also supports OS kernel page management on 4 KB boundaries, even when the OS kernel allocates memory in 64 KB units. As a result of support of large pages, VA64 supports all divisions of the virtual address range into pages that an existing paging system such as Intel Corporation's IA-32e paging system supports. Therefore, VA64 supports applications and hardware devices that work with a 4 KB-page Windows® OS kernel, while also taking full advantage of 64 KB pages when 64 KB pages can be used.

The capabilities of VA64 can be adopted gradually by the OS kernel, rather than requiring them all to be supported in the first generation VA64-capable OS kernel. For example, a VA64-capable OS kernel could start by mapping all pages to current sizes (e.g., 4 KB/2 GB/1 TB in Intel Corporation's IA-32e paging system), but changing to a new page table format. After the change in page table format, the OS kernel could be modified to map virtual memory in 64 KB units and change to store 64 KB pages in its free list. Then the OS kernel could start using 64 KB pages whenever alignment and protections permit, and add support for other VA64 capabilities.

I don't know anything else about the VA64 paging system other than what's written in the patent. There is nothing on it anywhere on the Internet. I guess we'll find out more soon.

Selected References

Denning, P.J. (1970). Virtual memory. ACM Computing Surveys Volume 2 Issue 3, 153-189.

Gelenbe, E., Tiberio, P., & Boekhorst, J. C. A. (1973). Page size in demand-paging systems. Acta Informatica, 3(1), 1-23.

Alanko, T. O., & Verkamo, A. I. (1983). Segmentation, paging and optimal page sizes in virtual memory. Performance Evaluation, 3(1), 13-33.

Corbató, F. J., & Vyssotsky, V. A. (1965). Introduction and overview of the Multics system. In Proceedings of the November 30--December 1, 1965, fall joint computer conference, part I (pp. 185-196).


Footnotes

(1) Actually the size of a single virtual page can be multiple of the size of a page frame, but let's not go there.

(2) We can generalize the formulation and use the term "word" to refer to the smallest addressable unit of memory rather than byte, but that is not important here.

(3) Maybe not programmers, depending on the programming language, but compilers, linkers, assemblers, and tools that work with binary code.

(4) It's certainly possible to design a system that supports using paging and nonpaging at the same time, thereby potentially supporting multiple units of allocation, but let's not go there.



回答5:

I got this from the wikipedia article and I quote:

Page size is usually determined by processor architecture

Check out the article below:

http://en.wikipedia.org/wiki/Page_(computer_memory)



回答6:

I add this answer/comment because the calculation of sleepsort is very interesting, I have to add it to my webpage (with mentioning the source of course). An (possible) answer to the question how you can calculate the pagesize programmatically can be found here. The way it is calculated as mentioned by sleepsort is very interesting. I did the same for x86_64 and the result wasn't what I expected. However further reading on memory management x86_64 I found that for 64 bit intel, 16 bits are not used, leave it 48 bits for us to calculate. 9 bits for the memory tree branches, each branch another 9 bits, here in another 9 for branches and finally 9 bits for the leaves of the last branch. So 48 - 36 = 12 bits for each address in the memory page. 2^12 = 4096 like mentioned before by sleepsort. I just wonder how many pass this architecture is, 3 or 4 (if anyone can explain that) following sleepsort's calculation:

2^x * 2^x * 2^x * 2^x * 2^y = 2^48
4x + y = 48

this time we have 8 bytes for each address (8 bytes * 8 bits / byte = 64 bits)

2^y / 2^3 = 2^x
y - 3 = x
filled in in previous equation:

4(y - 3) + y = 48
4y - 12 + y = 48
5y = 60
y = 12
because 2^y represents the pagesize, the size = 2^12 = 4096 bytes

Leave me with the question "what about those huge pages in Linux"?