I've been working on understanding how mmap() works with disk-backed files, and am mostly getting it, but I still have this question.
In a situation with a master process that forks a bunch of worker child processes, and a file-backed read-only mmapped db, does it matter if the mmaps happen in the master process before the forks, or in the child processes?
My understanding is that if it happens in the master process before the fork, then in the memory page table, all of the mapped pages are given the setting to make a page fault when they are read, triggering the kernel to load the page from disk (or from the page cache), and after the fork one child's reading of a page will mean the page is there in the mmap ready for other children to read without causing a major page fault.
But if the mmap happens in the child processes after the fork, do the other worker children get the benefit of sharing those loaded pages--are they all in effect using the same underlying mmap? Or does each worker child have to trigger a page fault and load each page themselves?