可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I've recently gotten the following error from my PHP:
WARNING: [pool www] child 42475 said into stderr: "*** glibc detected *** php-fpm: pool www: corrupted double-linked list: 0x00000000013fe680 ***"
I'm not very bothered by this issue, and not very interested in fixing it.
But I'm very interested in understanding what this error 'corrupted double-linked list' actually means, because I haven't seen it before. I believe to know what a double-linked list is, but I failed to produce a program that triggers this error.
Could somebody provide me a short snippet of code that causes the glibc to say 'corrupted double-linked list' when I compile and execute it?
回答1:
I have found the answer to my question myself:)
So what I didn't understand was how the glibc could differentiate between a Segfault and a corrupted double-linked list, because according to my understanding, from perspective of glibc they should look like the same thing.
Because if I implement a double-linked list inside my program, how could the glibc possibly know that this is a double-linked list, instead of any other struct? It probably can't, so thats why i was confused.
Now I've looked at malloc/malloc.c inside the glibc's code, and I see the following:
1543 /* Take a chunk off a bin list */
1544 #define unlink(P, BK, FD) { \
1545 FD = P->fd; \
1546 BK = P->bk; \
1547 if (__builtin_expect (FD->bk != P || BK->fd != P, 0)) \
1548 malloc_printerr (check_action, "corrupted double-linked list", P); \
1549 else { \
1550 FD->bk = BK; \
1551 BK->fd = FD; \
So now this suddenly makes sense. The reason why glibc can know that this is a double-linked list is because the list is part of glibc itself. I've been confused because I thought glibc can somehow detect that some programming is building a double-linked list, which I wouldn't understand how that works. But if this double-linked list that it is talking about, is part of glibc itself, of course it can know it's a double-linked list.
I still don't know what has triggered this error. But at least I understand the difference between corrupted double-linked list and a Segfault, and how the glibc can know this struct is supposed to be a double-linked list:)
回答2:
Heap overflow should be blame (but not always) for corrupted double-linked list
, malloc(): memory corruption
, double free or corruption (!prev)
-like glibc warnings.
It should be reproduced by the following code:
#include <vector>
using std::vector;
int main(int argc, const char *argv[])
{
int *p = new int[3];
vector<int> vec;
vec.resize(100);
p[6] = 1024;
delete[] p;
return 0;
}
if compiled using g++ (4.5.4):
$ ./heapoverflow
*** glibc detected *** ./heapoverflow: double free or corruption (!prev): 0x0000000001263030 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7af26)[0x7f853f5d3f26]
./heapoverflow[0x40138e]
./heapoverflow[0x400d9c]
./heapoverflow[0x400bd9]
./heapoverflow[0x400aa6]
./heapoverflow[0x400a26]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f853f57b4bd]
./heapoverflow[0x4008f9]
======= Memory map: ========
00400000-00403000 r-xp 00000000 08:02 2150398851 /data1/home/mckelvin/heapoverflow
00602000-00603000 r--p 00002000 08:02 2150398851 /data1/home/mckelvin/heapoverflow
00603000-00604000 rw-p 00003000 08:02 2150398851 /data1/home/mckelvin/heapoverflow
01263000-01284000 rw-p 00000000 00:00 0 [heap]
7f853f559000-7f853f6fa000 r-xp 00000000 09:01 201329536 /lib64/libc-2.15.so
7f853f6fa000-7f853f8fa000 ---p 001a1000 09:01 201329536 /lib64/libc-2.15.so
7f853f8fa000-7f853f8fe000 r--p 001a1000 09:01 201329536 /lib64/libc-2.15.so
7f853f8fe000-7f853f900000 rw-p 001a5000 09:01 201329536 /lib64/libc-2.15.so
7f853f900000-7f853f904000 rw-p 00000000 00:00 0
7f853f904000-7f853f919000 r-xp 00000000 09:01 74726670 /usr/lib64/gcc/x86_64-pc-linux-gnu/4.8.1/libgcc_s.so.1
7f853f919000-7f853fb19000 ---p 00015000 09:01 74726670 /usr/lib64/gcc/x86_64-pc-linux-gnu/4.8.1/libgcc_s.so.1
7f853fb19000-7f853fb1a000 r--p 00015000 09:01 74726670 /usr/lib64/gcc/x86_64-pc-linux-gnu/4.8.1/libgcc_s.so.1
7f853fb1a000-7f853fb1b000 rw-p 00016000 09:01 74726670 /usr/lib64/gcc/x86_64-pc-linux-gnu/4.8.1/libgcc_s.so.1
7f853fb1b000-7f853fc11000 r-xp 00000000 09:01 201329538 /lib64/libm-2.15.so
7f853fc11000-7f853fe10000 ---p 000f6000 09:01 201329538 /lib64/libm-2.15.so
7f853fe10000-7f853fe11000 r--p 000f5000 09:01 201329538 /lib64/libm-2.15.so
7f853fe11000-7f853fe12000 rw-p 000f6000 09:01 201329538 /lib64/libm-2.15.so
7f853fe12000-7f853fefc000 r-xp 00000000 09:01 74726678 /usr/lib64/gcc/x86_64-pc-linux-gnu/4.8.1/libstdc++.so.6.0.18
7f853fefc000-7f85400fb000 ---p 000ea000 09:01 74726678 /usr/lib64/gcc/x86_64-pc-linux-gnu/4.8.1/libstdc++.so.6.0.18
7f85400fb000-7f8540103000 r--p 000e9000 09:01 74726678 /usr/lib64/gcc/x86_64-pc-linux-gnu/4.8.1/libstdc++.so.6.0.18
7f8540103000-7f8540105000 rw-p 000f1000 09:01 74726678 /usr/lib64/gcc/x86_64-pc-linux-gnu/4.8.1/libstdc++.so.6.0.18
7f8540105000-7f854011a000 rw-p 00000000 00:00 0
7f854011a000-7f854013c000 r-xp 00000000 09:01 201328977 /lib64/ld-2.15.so
7f854031c000-7f8540321000 rw-p 00000000 00:00 0
7f8540339000-7f854033b000 rw-p 00000000 00:00 0
7f854033b000-7f854033c000 r--p 00021000 09:01 201328977 /lib64/ld-2.15.so
7f854033c000-7f854033d000 rw-p 00022000 09:01 201328977 /lib64/ld-2.15.so
7f854033d000-7f854033e000 rw-p 00000000 00:00 0
7fff92922000-7fff92943000 rw-p 00000000 00:00 0 [stack]
7fff929ff000-7fff92a00000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
[1] 18379 abort ./heapoverflow
and if compiled using clang++(6.0 (clang-600.0.56)):
$ ./heapoverflow
[1] 96277 segmentation fault ./heapoverflow
If you thought you might have written a bug like that, here is some hints to trace it out.
First, compile the code with debug flag(-g
):
g++ -g foo.cpp
And then, run it using valgrind:
$ valgrind ./a.out
==12693== Memcheck, a memory error detector
==12693== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==12693== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==12693== Command: ./a.out
==12693==
==12693== Invalid write of size 4
==12693== at 0x400A25: main (foo.cpp:11)
==12693== Address 0x5a1c058 is 12 bytes after a block of size 12 alloc'd
==12693== at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12693== by 0x4009F6: main (foo.cpp:8)
==12693==
==12693==
==12693== HEAP SUMMARY:
==12693== in use at exit: 0 bytes in 0 blocks
==12693== total heap usage: 2 allocs, 2 frees, 412 bytes allocated
==12693==
==12693== All heap blocks were freed -- no leaks are possible
==12693==
==12693== For counts of detected and suppressed errors, rerun with: -v
==12693== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
The bug is located in ==12693== at 0x400A25: main (foo.cpp:11)
回答3:
For anyone who is looking for solutions here, I had a similar issue with C++:
malloc(): smallbin double linked list corrupted:
This was due to a function not returning a value it was supposed to.
std::vector<Object> generateStuff(std::vector<Object>& target> {
std::vector<Object> returnValue;
editStuff(target);
// RETURN MISSING
}
Don't know why this was able to compile after all. Probably there was a warning about it.
回答4:
This may cause by different reasons, some said other candidates and i will introduce my case:
I got this error when using multi-threading (both std::pthread
and std::thread
) and the error is occurred because i forgot to lock a variable which multi threads may change at the same time.
this error comes randomly in some runs but not all because ... you know accident between to threads is random.
That variable in my case was a global std::vector
which i tried to push_back()
something in it in a function called by threads.. and then i used a std::mutex
and never got this error again.
may help some
回答5:
I ran into this error in some code where someone was calling exit() in one thread about the same time as main()
returned, so all the global/static constructors were being kicked off in two separate threads simultaneously.
This error also manifests as double free or corruption
, or a segfault/sig11 inside exit()
or inside malloc_consolidate
, and likely others. The call stack for the malloc_consolidate crash may resemble:
#0 0xabcdabcd in malloc_consolidate () from /lib/libc.so.6
#1 0xabcdabcd in _int_free () from /lib/libc.so.6
#2 0xabcdabcd in operator delete (...)
#3 0xabcdabcd in operator delete[] (...)
(...)
I couldn't get it to exhibit this problem while running under valgrind.