How to provide extend-on-write functionality for m

2019-03-29 09:26发布

问题:

I'm working on porting some code from AIX to Linux. Parts of the code use the shmat() system call to create new files. When used with SHM_MAP in a writable mode, one can extend the file beyond its original length (of zero, in my case):

When a file is mapped onto a segment, the file is referenced by accessing the segment. The memory paging system automatically takes care of the physical I/O. References beyond the end of the file cause the file to be extended in page-sized increments. The file cannot be extended beyond the next segment boundary.

(A "segment" in AIX is a 256 MB chunk of address space, and a "page" is usually 4 KB.)

What I would like to do on Linux is the following:

  • Reserve a large-ish chunk of address space (it doesn't have to be as big as 256 MB, these aren't such large files)
  • Set up the page protection bits so that a segfault is generated on the first access to a page that hasn't been touched before
  • On a page fault, clear the "cause a page fault" bit and allocate committed memory for the page, allowing the write (or read) that caused the page fault to proceed
  • Upon closing the shared memory area, write the modified pages to a file

I know I can do this on Windows with the VirtualProtect function, the PAGE_GUARD memory protection bit, and a structured exception handler. What is the corresponding method on Linux to do the same? Is there perhaps a better way to implement this extend-on-write functionality on Linux?

I've already considered:

  • using mmap() with some fixed large-ish size, but I can't tell how much of the file was written to by the application code
  • allocating an anonymous shared memory area of large-ish size, but again I can't tell how much of the area has been written
  • mmap() by itself does not seem to provide any facility to extend the length of the backing file

Naturally I would like to do this with only minimal changes to the application code.

回答1:

This is very similar to a homework I once did. Basically I had a list of "pages" and a list of "frames", with associated information. Using SIGSEGV I would catch faults and alter the memory protection bits as necessary. I'll include parts that you may find useful.

Create mapping. Initially it has no permissions.

int w_create_mapping(size_t size, void **addr)
{

    *addr = mmap(NULL,
            size * w_get_page_size(),
            PROT_NONE,
            MAP_ANONYMOUS | MAP_PRIVATE,
            -1,
            0
    );

    if (*addr == MAP_FAILED) {
        perror("mmap");
        return FALSE;
    }

    return TRUE;
}

Install signal handler

int w_set_exception_handler(w_exception_handler_t handler)
{
    static struct sigaction sa;
    sa.sa_sigaction = handler;
    sigemptyset(&sa.sa_mask);
    sigaddset(&sa.sa_mask, SIGSEGV);
    sa.sa_flags = SA_SIGINFO;

    if (sigaction(SIGSEGV, &sa, &previous_action) < 0)
        return FALSE;

    return TRUE;
}

Exception handler

static void fault_handler(int signum, siginfo_t *info, void *context)
{
    void *address;      /* the address that faulted */

    /* Memory location which caused fault */
    address = info->si_addr;

    if (FALSE == page_fault(address)) {
        _exit(1);
    }
}

Increasing protection

int w_protect_mapping(void *addr, size_t num_pages, w_prot_t protection)
{
    int prot;

    switch (protection) {
    case PROTECTION_NONE:
        prot = PROT_NONE;
        break;
    case PROTECTION_READ:
        prot = PROT_READ;
        break;
    case PROTECTION_WRITE:
        prot = PROT_READ | PROT_WRITE;
        break;
    }

    if (mprotect(addr, num_pages * w_get_page_size(), prot) < 0)
        return FALSE;

    return TRUE;
}

I can't publicly make it all available since the team is likely to use that same homework again.



回答2:

Allocate a big buffer however you like and then use mprotect()* system call to make the tail of the buffer read only and register a signal handler for SIGSEGV to note where in the before writes have been made and use mprotect() yet again to enable writes.

  • http://linux.die.net/man/2/mprotect


回答3:

I've contemplated similar things myself, and haven't found any way for mmap() to extend the backing file either.

Currently, I plan on trying two alternatives:

  • manually manage filesize, extending it myself and mremap()'ing afterwards
  • create a sparse file and hope that the VM would allocate needed sectors when flushing dirty pages.

honestly, I don't think sparse files would work, but it's worth a try.



标签: linux posix aix