I am implementing a divide and conquer polynomial algorithm so I can benchmark it against an OpenCL implementation, but I can't get malloc
to work. When I run the program, it allocates a bunch of stuff, checks some things, then sends the size/2
to the algorithm. Then when I hit the malloc
line again it spits out this:
malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed. Aborted
The line in question is:
int *mult(int size, int *a, int *b) {
int *out,i, j, *tmp1, *tmp2, *tmp3, *tmpa1, *tmpa2, *tmpb1, *tmpb2,d, *res1, *res2;
fprintf(stdout, "size: %d\n", size);
out = (int *)malloc(sizeof(int) * size * 2);
}
I checked size with a fprintf
, and it is a positive integer (usually 50 at that point). I tried calling malloc
with a plain number as well and I still get the error. I'm just stumped at what's going on, and nothing from Google I have found so far is helpful.
Any ideas what's going on? I'm trying to figure out how to compile a newer GCC in case it's a compiler error, but I really doubt it.
To give you a better understanding of why this happens, I'd like to expand upon @r-samuel-klatchko's answer a bit.
When you call
malloc
, what is really happening is a bit more complicated than just giving you a chunk of memory to play with. Under the hood,malloc
also keeps some housekeeping information about the memory it has given you (most importantly, its size), so that when you callfree
, it knows things like how much memory to free. This information is commonly kept right before the memory location returned to you bymalloc
. More exhaustive information can be found on the internet™, but the (very) basic idea is something like this:Building on this (and simplifying things greatly), when you call
malloc
, it needs to get a pointer to the next part of memory that is available. One very simple way of doing this is to look at the previous bit of memory it gave away, and movesize
bytes further down (or up) in memory. With this implementation, you end up with your memory looking something like this after allocatingp1
,p2
andp3
:So, what is causing your error?
Well, imagine that your code erroneously writes past the amount of memory you've allocated (either because you allocated less than you needed as was your problem or because you're using the wrong boundary conditions somewhere in your code). Say your code writes so much data to
p2
that it starts overwriting what is inp3
'ssize
field. When you now next callmalloc
, it will look at the last memory location it returned, look at its size field, move top3 + size
and then start allocating memory from there. Since your code has overwrittensize
, however, this memory location is no longer after the previously allocated memory.Needless to say, this can wreck havoc! The implementors of
malloc
have therefore put in a number of "assertions", or checks, that try to do a bunch of sanity checking to catch this (and other issues) if they are about to happen. In your particular case, these assertions are violated, and thusmalloc
aborts, telling you that your code was about to do something it really shouldn't be doing.As previously stated, this is a gross oversimplification, but it is sufficient to illustrate the point. The glibc implementation of
malloc
is more than 5k lines, and there have been substantial amounts of research into how to build good dynamic memory allocation mechanisms, so covering it all in a SO answer is not possible. Hopefully this has given you a bit of a view of what is really causing the problem though!We got this error because we forgot to multiply by sizeof(int). Note the argument to malloc(..) is a number of bytes, not number of machine words or whatever.
My alternative solution to using Valgrind:
I'm very happy now because I just helped my friend debug a program. His program had this exact problem (
malloc()
causing abort), with the same error message from GDB.I compiled his program with
And then ran
gdb new
. When the program gets terminated bySIGABRT
caused in a subsequentmalloc()
, a whole lot of useful information is printed:Let's take a look at the output, especially the stack trace:
The first part says there's a invalid write operation at
new.c:59
. That line readsThe second part says the memory that the bad write happened on is created at
new.c:55
. That line readsThat's it. It only took me less than half a minute to locate the bug that confused my friend for a few hours. He managed to locate the failure, but it's a subsequent
malloc()
call that failed, without being able to spot this error in previous code.Sum up: Try the
-fsanitize=address
of GCC, it can be very helpful when debugging memory issues.I was porting one application from Visual C to gcc over Linux and I had the same problem with
I moved the same code to a Suse distribution (on other computer ) and I don't have any problem.
I suspect that the problems are not in our programs but in the own libc.
99.9% likely that you have corrupted memory (over- or under-flowed a buffer, wrote to a pointer after it was freed, called free twice on the same pointer, etc.)
Run your code under Valgrind to see where your program did something incorrect.
You are probably overrunning beyond the allocated mem somewhere. then the underlying sw doesn't pick up on it until you call malloc
There may be a guard value clobbered that is being caught by malloc.
edit...added this for bounds checking help
http://www.lrde.epita.fr/~akim/ccmp/doc/bounds-checking.html