On a single core computer, one thread is executing at a time. On each context switch the scheduler checks if the new thread to schedule is in the same process than the previous one. If so, nothing needs to be done regarding the MMU (pages table). In the other case, the pages table needs to be updated with the new process pages table.
I am wondering how things happen on a multi-core computer. I guess there is a dedicated MMU on each core, and if two threads of the same process are running simultaneously on 2 cores, each of this core's MMU simply refer to the same page table. Is this true ? Can you point me to good references on the subject ?
Sorry for previous answer. Deleted the answer.
TI PandaBoard runs on OMAP4430 Dual Cortex A9 Processor. It has one MMU per core. It has 2 MMU for 2 cores.
http://forums.arm.com/index.php?/topic/15240-omap4430-panda-board-armcortex-a9-mp-core-mmu/
The above thread provides the info.
In Addition , some more information on ARM v7
Each core has the following features:
The dual core configuration is completed by a common set of components:
Though all these are for ARM , it will provide general idea.
Answers here so far seem to be unaware of the existence of the Translation Lookaside Buffer (TLB), which is the MMU's way of converting the virtual addresses used by a process to a physical memory address.
Note that these days the TLB itself is a complicated beast with multiple levels of caching. Just like a CPU's regular RAM caches (L1-L3), you wouldn't necessarily expect it's state at any given instant to contain info exclusively about the currently running process but for that to be moved in piecemeal on demand; see the Context Switch section of the wikipedia page.
On SMP, all processors' TLBs need to keep a consistent view of the system page table. See e.g this section of the linux kernel book for one way of handling it.
On the question of MMUs per processor there may be several. The assumption is that each MMU will add additional memory bandwidth. If DDR3-12800 memory allows 1600 mega-transfers per second on a processor with one MMU then one with four will theoretically allow 6400. Securing the bandwidth to the cores available is probably quite a feat. The bandwidth advertised will be whittled away quite a bit in the process.
The number of MMUs on a processor is independent of the number of cores on it. The obvious examples are the 16 core CPUs from AMD, they definitely don't have 16 MMUs. A dual-core processor, on the other hand, might have two MMUs. Or just one. Or three?
Edit
Maybe I'm confusing MMUs with channels?
TL;DR - There is a seperate MMU per CPU, but an MMU generally has several LEVELS of page tables and these may be shared.
For instance, on an ARM the top level (PGD or page global directory name used in Linux) covers 1MB of address space. In simple systems, you can map in 1MB sections. However, this normally points to a 2nd level table (PTE or page table entry).
One way to implement multi-CPU efficiently is to have a separate top level PGD per CPU. The OS code and data will be consistent between cores. Each core will have it's own TLB and L1-cache; L2/L3 caches maybe shared or may not. The maintenance of data/code caches depend on whether they are VIVT or VIPT, but that is a side issue and shouldn't effect the use of MMU and multi-core.
The process or user portion of the 2nd level page tables remain the same per process; otherwise they would have different memory or you would need to synchronize redundant tables. Individual cores may have different sets of 2nd level page tables (different top level page table pointer) when they run different processes. If it is multi-threaded, and running on two CPUs then the top level table may contain the same 2nd level page table entries for the process. In fact, the entire top level page table maybe identical (but different memory) when two CPUs run the same process. If thread local data is implemented with an MMU a single entry could differ. However, thread local data is usually implemented in other ways due to TLB and cache issue (flushing/coherency).
The image below may help. The CPU, PGD and PTE entries in the diagram are sort of like pointers.
The dashed line is the only difference between running different processes and the same processes (multi-threading case) with the MMU; it is an alternate to the solid line running from the CPU2 PGD to the process B PTE or 2nd level page table. The kernel is always a multi-threaded CPU application.
When a virtual address is translated, different bit portions are indexes into each table. If a virtual address is not in the TLB, then the CPU must do a table walk (and fetch different table memory). So a single read of a process memory would result in three memory accesses (if the TLB wasn't present).
The access permission of the kernel code/data are obviously different. In fact, there will probably be other issues such as device memory, etc. However, I think the diagram should make it obvious how the MMU manages to keep multi-threaded memory the same.
It is entirely possible that an entry in the 2nd level table could be different per thread. However, this would incur a cost when switching threads on the same CPU so normally data for all 'thread locals' is mapped and some other way to select the data is used. Normally the thread local data is found via a pointer or index register (special per CPU) which is mapped/points to data inside the 'process' or user memory. 'thread local data' is not isolated from other threads, so if you have a memory overwrite in one thread you could kill another threads data.
AFAIK there is a single MMU per physical processor, at least in SMP systems, so all cores share a single MMU.
In NUMA systems each core has a separate MMU, because each core has its own private memory.
Take a look at this scheme. This is high level view of all that there is in a single core on a Corei7 cpu. The picture has been taken from Computer Systems: A Programmer's Perspective, Bryant and Hallaron. You can have access to diagrams in here, section 9.21.