By examing several DLLs I have in my windows machine (for instance KERNEL32.DLL) I've notice that none of their sections, not even the read only data section have the IMAGE_SCN_MEM_SHARED flag set.
DLLs are mapped from the .dll file so only when you read a page of the file it is copied to physical memory but still, if the same page of let's say kernel32.dll is accessed by both process A and process B then the page will exist twice in physical memory. I am asking for the veracity of this last statement.
If the .text or the .rodata segment where shared they would get copied to physical memory only onced, even when ASLR is enabled because what ASLR does is randomize the base of a module when it is first loaded (with corresponding relocations applied) but the next process that loads this module untill system restart will get the module at the same address so the .text and .rodata could be shared in the same manner.
These are all assumptions I made, please correct me.
Thanks!
The OS will definitely be able to map multiple virtual addresses to the same physical memory page, as long as the page content does not (need to) change [in different ways for different processes]. However, if the code uses an absolute address (either internally or externally to the DLL), for example a vtable/function pointers, pointers to global data (constant or non-constant) or simply function calls with absolute addresses, the address must be modified to match the actual address given by the OS to that section of memory. This is called "relocation".
So, at least in theory, you can share the same DLL even with address space randomization, it just requires a little more work from the compiler and/or programmer. In particular, it requires that there are no relocations (in large chunks of the code). If the code has absolute addresses that are relocated based on the code-address, then it will need to have one copy per DLL.
I don't actually know how the OS deals with this. A simple solution is obviously to randomize the address only once per DLL (until that particular DLL is unloaded), regardless of how many applications use the same DLL. It still makes it rather hard for an outsider to know what address the DLL is loaded at, since it will load at a different address each time it gets loaded the first time (and more importantly, it will not be a static value for ALL machines using the same version of OS, which would be the case without this feature). It does, however, mean that long-running processes can be "inspected" by copying content from for example the stack that has known content. Web servers, database servers and system services are typically long-running processes, and as such will have different addresses only when the system is "shut down" (or at least the long running process is restarted).
The second, slightly trickier version is to check if a particular page (typically 4KB region of memory) has relocations, and share all the pages that have no relocation. Relocated pages need to have one copy per base address. It is typical to have "all references to external resources" in one block in DLL's (a "thunk section"), so the typical big part of a DLL wouldn't regardless of what the base address of the code, which means that is definitely a workable solution.
If neither of these schemes "work" in the OS, then you have to load the same DLL multiple times. This clearly works from the perspective of the OS anyway, as prior to ASLR, the base-address of the same DLL will need to be moved in case of two DLL's trying to load at the same address (for example DLL's produced by different vendors, that happen to pick the same base-address for the code, or the classic and common "I never gave a base address, so it uses the default address") - the OS will resolve such conflicts by changing the base address of the one loaded first.
As to the meaning of IMAGE_SCN_MEM_SHARED
, I would have thought that the developer would request this, where the sharing of pages in a DLL is done automatically. In other words IMAGE_SCN_MEM_SHARED
will be set by the developer of a particular DLL or EXE to signify the content should be shared with other users of the same content, rather than "the OS can share it if it can be done without the user of the content noticing" (which is certainly the case for sharing code, and (writeable) data is typically not shared between DLL's. Read-only data, as long as it has no relocations, can of course implicitly be shared [the user of that content can not tell if it is shared or not].