Does Linux malloc() behave differently on ARM vs x

2019-03-17 00:40发布

There are a lot of questions about memory allocation on this site, but I couldn't find one that specifically addresses my concern. This question seems closest, and it led me to this article, so... I compared the behavior of the three demo programs it contains on a (virtual) desktop x86 Linux system and an ARM-based system.

My findings are detailed here, but the quick summary is: on my desktop system, the demo3 program from the article seems to show that malloc() always lies about the amount of memory allocated—even with swap disabled. For example, it cheerfully 'allocates' 3 GB of RAM, and then invokes the OOM killer when the program starts to actually write to all that memory. With swap disabled, the OOM killer gets invoked after writing to only 610 MB of the 3 GB malloc() has supposedly made available.

The purpose of the demo program is to demonstrate this well-documented 'feature' of Linux, so none of this is too surprising. But the behavior is different on our i.MX6-based embedded target at work, where malloc() appears to be telling the truth about how much RAM it allocates(?) The program below (reproduced verbatim from the article) always gets OOM-killed in the second loop when i == n:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define N       10000

int main (void) {
        int i, n = 0;
        char *pp[N];

        for (n = 0; n < N; n++) {
                pp[n] = malloc(1<<20);
                if (pp[n] == NULL)
                        break;
        }
        printf("malloc failure after %d MiB\n", n);

        for (i = 0; i < n; i++) {
                memset (pp[i], 0, (1<<20));
                printf("%d\n", i+1);
        }

        return 0;
}

So my question, in a nutshell, is: why does the demo3 program—or some other unlucky OOM killer victim—always get killed long before i == n on my desktop system (implying that malloc() is a liar), but it only gets killed when i == n on our i.MX6 ARM target (implying that malloc() may be telling the truth)? Is this difference a function of the libc and/or kernel version, or something else? Can I conclude that malloc() will always return NULL if allocation fails on this target?

NOTE: Some details on each system (please note that overcommit_memory and overcommit_ratio have the same values for both):

# Desktop system
% uname -a
Linux ubuntu 3.8.0-33-generic #48-Ubuntu SMP Wed Oct 23 17:26:34 UTC 2013 i686 i686 i686 GNU/Linux
% /lib/i386-linux-gnu/libc.so.6 
GNU C Library (Ubuntu EGLIBC 2.17-0ubuntu5.1) stable release version 2.17, by Roland McGrath et al.
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.7.3.
Compiled on a Linux 3.8.13 system on 2013-09-30.
Available extensions:
    crypt add-on version 2.1 by Michael Glad and others
    GNU Libidn by Simon Josefsson
    Native POSIX Threads Library by Ulrich Drepper et al
    BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/eglibc/+bugs>.
% cat /proc/sys/vm/overcommit_memory
0
% cat /proc/sys/vm/overcommit_ratio 
50

# i.MX6 ARM system
# uname -a
Linux acmewidgets 3.0.35-ts-armv7l #2 SMP PREEMPT Mon Aug 12 19:27:25 CST 2013 armv7l GNU/Linux
# /lib/libc.so.6
GNU C Library (GNU libc) stable release version 2.17, by Roland McGrath et al.
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.7.3.
Compiled on a Linux 3.0.35 system on 2013-08-14.
Available extensions:
    crypt add-on version 2.1 by Michael Glad and others
    Native POSIX Threads Library by Ulrich Drepper et al
    BIND-8.2.3-T5B
libc ABIs: UNIQUE
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.
# cat /proc/sys/vm/overcommit_memory
0
% cat /proc/sys/vm/overcommit_ratio 
50

BACKGROUND: We're trying to decide how to handle low memory conditions in our media-oriented embedded application, and want to know whether we can—for this specific target—trust malloc() to alert us when allocation fails. My experience with desktop Linux apps made me think the answer was certainly not, but now I'm not so sure.

1条回答
Animai°情兽
2楼-- · 2019-03-17 01:25

A little background

malloc() doesn't lie, your kernel Virtual Memory subsystem does, and this a common practice on most modern Operating Systems. When you use malloc(), what's really happening is something like this:

  1. The libc implementation of malloc() checks its internal state, and will try to optimize your request by using a variety of strategies (like trying to use a preallocated chunk, allocating more memory than requested in advance...). This means the implementation will impact on the performance and change a little the amount of memory requested from the kernel, but this is not really relevant when checking the "big numbers", like you're doing in your tests.

  2. If there's no space in a preallocated chunk of memory (remember, chunks of memory are usually pretty small, in the order of 128KB to 1MB), it will ask the kernel for more memory. The actual syscall varies from one kernel to another (mmap(), vm_allocate()...) but its purpose is mostly the same.

  3. The VM subsystem of the kernel will process the request, and if it finds it to be "acceptable" (more on this subject later), it will create a new entry in the memory map of the requesting task (I'm using UNIX terminology, where task is a process with all its state and threads), and return the starting value of said map entry to malloc().

  4. malloc() will take note of the newly allocated chunk of memory, and will return the appropriate answer to your program.

OK, so now you're program has successfully malloc'ed some memory, but the truth is that not a single page (4KB in x86) of physical memory has been actually allocated to your request yet (well, this is an oversimplification, as collaterally some pages could have been used to store info about the state of the memory pool, but it makes it easier to illustrate the point).

So, what happens when you try to access this recently malloc'ed memory? A segmentation fault. Surprisingly, this is a relatively little known fact, but your system is generating segmentation faults all the time. Your program is then interrupted, the kernel takes control, checks if the address faulting corresponds to a valid map entry, takes one or more physical pages and links them to the task's map.

If your program tries to access an address which is not inside a map entry in your task, the kernel will not be able to resolve the fault, and will send the signal (or the equivalent mechanism for non-UNIX systems) to it pointing out this problem. If the program doesn't handle that signal by itself, it will be killed with the infamous Segmentation Fault error.

So physical memory is not allocated when you call malloc(), but when you actually access that memory. This allows the OS to do some nifty tricks like disk paging, balloning and overcommiting.

This way, when you ask how much memory a specific process is using, you need to look at two different numbers:

  • Virtual Size: The amount of memory that has been requested, even if it's not actually used.

  • Resident Size: The memory which it is really using, backed by physical pages.

How much overcommit is enough?

In computing, resource management in a complex issue. You have a wide range of strategies, from the most strict capability-based systems, to the much more relaxed behavior of kernels like Linux (with memory_overcommit == 0), which basically will allow you to request memory up to the maximum map size allowed for a task (which is a limit that depends on the architecture).

In the middle, you have OSes like Solaris (mentioned in your article), which limit the amount of virtual memory for a task to the sum of (physical pages + swap disk pages). But don't be fooled by the article you referenced, this is not always a good idea. If you're running a Samba or Apache server with hundreds to thousands of independent processes running at the same time (which leads to a lot of virtual memory wasting due to fragmentation), you'll have to configure a ridiculous amount of swap disk, or your system will run out of virtual memory, while still having a lot of free RAM.

But why does memory overcommit work differently on ARM?

It doesn't. At least it shouldn't, but ARM vendors have an insane tendency to introduce arbitrary changes to the kernels they distribute with their systems.

In your test case, the x86 machine is working as it is expected. As you're allocating memory in small chunks, and you have vm.overcommit_memory set to 0, you're allowed to fill all your virtual space, which is somewhere on the 3GB line, because you're running it on a 32 bit machine (if you try this on 64 bits, the loop will run until n==N). Obviously, when you try to use that memory, the kernel detects that physical memory is getting scarce, and activates the OOM killer countermeasure.

On ARM it should be the same. As it doesn't, two possibilities come to my mind:

  1. overcommit_memory is on NEVER (2) policy, perhaps because someone has forced it this way on the kernel.

  2. You're reaching the maximum allowed map size for the task.

As on each run on ARM, you get different values for the malloc phase, I would discard the second option. Make sure overcommit_memory is enabled (value 0) and rerun your test. If you have access to those kernel sources, take a look at them to make sure the kernel honors this sysctl (as I said, some ARM vendors like to do nasty things to their kernels).

As a reference, I've ran demo3 under QEMU emulating vertilepb and on an Efika MX (iMX.515). The first one stopped malloc'ing at the 3 GB mark, as expected on a 32 bit machine, while the other did it earlier, at 2 GB. This may come as a surprise, but if you take a look at its kernel config (https://github.com/genesi/linux-legacy/blob/master/arch/arm/configs/mx51_efikamx_defconfig), you'll see this:

CONFIG_VMSPLIT_2G=y
# CONFIG_VMSPLIT_1G is not set
CONFIG_PAGE_OFFSET=0x80000000

The kernel is configured with a 2GB/2GB split, so the system is behaving as expected.

查看更多
登录 后发表回答