What is the downside of updating ARM TTBR(Translat

This question is related to this one: While "fork"ing a process, why does Linux kernel copy the content of kernel page table for every newly created process?

I found that Linux kernel tries to avoid updating TTBR when switching between user land and kernel land by copying the content of swapper page table into every newly created page table in function pgd_alloc. Question is: What is the downside of updating ARM TTBR?

Updating the TTBR (translation table base register)^Note1 with the MMU enables has many perils. There are interrupts, page faults, TLB (MMU-cache) and both L1 and L2 caches to be considered. On different systems, the caches maybe PIPT or VIVT (physically or virtually tagged), there may or may not exist L1 nor L2 caches.

People seem overly concerned about the MMU and TLB for efficiency. They are always dwarfed by the primary L1/L2 caches in performance considerations. It is a smaller impact to update the MMU tables and perform TLB flushes than it is to have un-needed evictions from the L1/L2 code and data caches. At a minimum a TLB is worth 4KB or over 100 cache lines. In some cases, the TLB entry maybe 1MB.

Some data/code in the L1/L2 user space may need to be evicted on context switches. However, for very frequent small work-loads, a user context switch may keep code and data in the L1/L2. For example a media player doing large CPU intensive decoding and some cron task checking to see no new email is on a server. The switch to and back from the 'cron' task may result in code remaining in the L2 cache for the video decoding to use.

What is the downside of updating ARM TTBR?

Unless the from/to tables are identical you have to keep the system view of memory consistent for the duration of the update.^Note2 This will naturally cause IRQ latency and complexity of implementation as you need to sync up many sub-systems. Also, the Linux MM (memory management) code is architecture agnostic. It handles a great variety of MMU sub-systems. The goal is never to optimize locally (at the architecture level) but optimize globally at the generic layers.

Note1: The TTBR is a pointer to a physical 16k aligned memory region that is the first level of the ARM MMU. Each entry is 1MB (on 32bit systems) and may point to another table; often called L2.

Note2: You might do this in a boot loader or places where you are migrating system level code between memory devices. Ie, update the TTBR with identical tables is of no consequence by itself. It is when the tables differ that weird things will happen.