Reading huge files using Memory Mapped Files

2020-02-14 02:09发布

I see many articles suggesting not to map huge files as mmap files so the virtual address space won't be taken solely by the mmap.

How does that change with 64 bit process where the address space dramatically increases? If I need to randomly access a file, is there a reason not to map the whole file at once? (dozens of GBs file)

3条回答
聊天终结者
2楼-- · 2020-02-14 02:56

One thing to be aware of is that memory mapping requires big contiguous chunks of (virtual) memory when the mapping is created; on a 32-bit system this particularly sucks because on a loaded system, getting long runs of contiguous ram is unlikely and the mapping will fail. On a 64-bit system this is much easier as the upper bound of 64-bit is... huge.

If you are running code in controlled environments (e.g. 64-bit server environments you are building yourself and know to run this code just fine) go ahead and map the entire file and just deal with it.

If you are trying to write general purpose code that will be in software that could run on any number of types of configurations, you'll want to stick to a smaller chunked mapping strategy. For example, mapping large files to collections of 1GB chunks and having an abstraction layer that takes operations like read(offset) and converts them to the offset in the right chunk before performing the op.

Hope that helps.

查看更多
家丑人穷心不美
3楼-- · 2020-02-14 03:04

There's a reason to think carefully of using memory-mapped files, even on 64-bit platform (where virtual address space size is not an issue). It's related to the (potential) error handling.

When reading the file "conventionally" - any I/O error is reported by the appropriate function return value. The rest of error handling is up to you.

OTOH if the error arises during the implicit I/O (resulting from the page fault and attempt to load the needed file portion into the appropriate memory page) - the error handling mechanism depends on the OS.

In Windows the error handling is performed via SEH - so-called "structured exception handling". The exception propagates to the user mode (application's code) where you have a chance to handle it properly. The proper handling requires you to compile with the appropriate exception handling settings in the compiler (to guarantee the invocation of the destructors, if applicable).

I don't know how the error handling is performed in unix/linux though.

P.S. I don't say don't use memory-mapped files. I say do this carefully

查看更多
放荡不羁爱自由
4楼-- · 2020-02-14 03:16

On 64bit, go ahead and map the file.

One thing to consider, based on Linux experience: if the access is truly random and the file is much bigger than you can expect to cache in RAM (so the chances of hitting a page again are slim) then it can be worth specifying MADV_RANDOM to madvise to stop the accumulation of hit file pages steadily and pointlessly swapping other actually useful stuff out. No idea what the windows equivalent API is though.

查看更多
登录 后发表回答