I'm working on porting some code from AIX to Linux. Parts of the code use the shmat()
system call to create new files. When used with SHM_MAP
in a writable mode, one can extend the file beyond its original length (of zero, in my case):
When a file is mapped onto a segment, the file is referenced by accessing the segment. The memory paging system automatically takes care of the physical I/O. References beyond the end of the file cause the file to be extended in page-sized increments. The file cannot be extended beyond the next segment boundary.
(A "segment" in AIX is a 256 MB chunk of address space, and a "page" is usually 4 KB.)
What I would like to do on Linux is the following:
- Reserve a large-ish chunk of address space (it doesn't have to be as big as 256 MB, these aren't such large files)
- Set up the page protection bits so that a segfault is generated on the first access to a page that hasn't been touched before
- On a page fault, clear the "cause a page fault" bit and allocate committed memory for the page, allowing the write (or read) that caused the page fault to proceed
- Upon closing the shared memory area, write the modified pages to a file
I know I can do this on Windows with the VirtualProtect function, the PAGE_GUARD
memory protection bit, and a structured exception handler. What is the corresponding method on Linux to do the same? Is there perhaps a better way to implement this extend-on-write functionality on Linux?
I've already considered:
- using
mmap()
with some fixed large-ish size, but I can't tell how much of the file was written to by the application code
- allocating an anonymous shared memory area of large-ish size, but again I can't tell how much of the area has been written
mmap()
by itself does not seem to provide any facility to extend the length of the backing file
Naturally I would like to do this with only minimal changes to the application code.
This is very similar to a homework I once did. Basically I had a list of "pages" and a list of "frames", with associated information. Using SIGSEGV
I would catch faults and alter the memory protection bits as necessary. I'll include parts that you may find useful.
Create mapping. Initially it has no permissions.
int w_create_mapping(size_t size, void **addr)
{
*addr = mmap(NULL,
size * w_get_page_size(),
PROT_NONE,
MAP_ANONYMOUS | MAP_PRIVATE,
-1,
0
);
if (*addr == MAP_FAILED) {
perror("mmap");
return FALSE;
}
return TRUE;
}
Install signal handler
int w_set_exception_handler(w_exception_handler_t handler)
{
static struct sigaction sa;
sa.sa_sigaction = handler;
sigemptyset(&sa.sa_mask);
sigaddset(&sa.sa_mask, SIGSEGV);
sa.sa_flags = SA_SIGINFO;
if (sigaction(SIGSEGV, &sa, &previous_action) < 0)
return FALSE;
return TRUE;
}
Exception handler
static void fault_handler(int signum, siginfo_t *info, void *context)
{
void *address; /* the address that faulted */
/* Memory location which caused fault */
address = info->si_addr;
if (FALSE == page_fault(address)) {
_exit(1);
}
}
Increasing protection
int w_protect_mapping(void *addr, size_t num_pages, w_prot_t protection)
{
int prot;
switch (protection) {
case PROTECTION_NONE:
prot = PROT_NONE;
break;
case PROTECTION_READ:
prot = PROT_READ;
break;
case PROTECTION_WRITE:
prot = PROT_READ | PROT_WRITE;
break;
}
if (mprotect(addr, num_pages * w_get_page_size(), prot) < 0)
return FALSE;
return TRUE;
}
I can't publicly make it all available since the team is likely to use that same homework again.
Allocate a big buffer however you like and then use mprotect()* system call to make the tail of the buffer read only and register a signal handler for SIGSEGV to note where in the before writes have been made and use mprotect() yet again to enable writes.
- http://linux.die.net/man/2/mprotect
I've contemplated similar things myself, and haven't found any way for mmap()
to extend the backing file either.
Currently, I plan on trying two alternatives:
- manually manage filesize, extending it myself and
mremap()
'ing afterwards
- create a sparse file and hope that the VM would allocate needed sectors when flushing dirty pages.
honestly, I don't think sparse files would work, but it's worth a try.