Weird behavior of malloc()

2019-04-16 15:51发布

问题:

Trying to understand answers to my question

what happens when tried to free memory allocated by heap manager, which allocates more than asked for?

I wrote this function and puzzled by its output

int main(int argc,char **argv){
  char *p,*q;
  p=malloc(1); 
  strcpy(p,"01234556789abcdefghijklmnopqrstuvwxyz"); //since malloc allocates atleast 1 byte
  q=malloc(2);
  //    free(q);
  printf("q=%s\n",q);
  printf("p=%s\n",p);

  return 0;
}

Output

q=vwxyz
p=01234556789abcdefghijklm!

Can any one explain this behavior? or is this implementation specific?

also if free(q) is uncommented, I am getting SIGABRT.

回答1:

You are copying more bytes to *p than you have allocated, overwriting whatever might have been at the memory locations after the allocated space.

When you then call malloc again, it takes a part of memory it knows to be unused at the moment (which happens to be a few bytes after *p this time), writes some bookkeeping information there and returns a new pointer to that location.

The bookkeeping information malloc writes happens to start with a '!' in this run, followed by a zero byte, so your first string is truncated. The new pointer happens point to the end of the memory you overwrote before.

All this is implementation specific and might lead to different results each run or depending on the phase of the moon. The second call to malloc() would also absolutely be in its right to just crash the program in horrible ways (especially since you might be overwriting memory that malloc uses internally).



回答2:

You are just being lucky this time: this is an undefined behavior and don't count on it.

Ususally, but depending on the OS, memory is allocated in "pages" (i.e. multiple bytes). Malloc() on the other hand allocates memory from those "pages" in a more "granular" way: there is "overhead" associated with each allocation being managed through malloc.

The signal you are getting from free is most probably related to the fact that you mess up the memory management by writing past what you were allocated with p i.e. writing on the overhead information used by the memory manager to keep track of memory blocks etc.



回答3:

This is a classical heap overflow. p has only 1 byte, but the heap manager pads the allocation (32 bytes in your case). q is allocated right after p, so it naturally gets the next available spot. For example if the address of p is 0x1000, the adress that gets assigned to q is 0x1020. This explains why q points to part of the string.

The more interesting question is why p is only "01234556789abcdefghijklm" and not "01234556789abcdefghijklmnopqrstuvwxyz". The reason is that memory manager uses the gaps between allocation for its internal bookkeeping. From a memory manager perspective the memory layout is as following: p D q where D is internal data structure of memory manager (0x1010 to 0x1020 in our example). While allocating memory for q, the heap manager writes its stuff to the bookkeeping area (0x1010 to 0x1020). A byte is changed to 0 truncates the string since it is treated as NULL terminator.



回答4:

THE VALUE OF "p":

you allocated enough space to fit this: ""

[[ strings are null terminated, remember? you don't see it, but it's there -- so that's one byte used up. ]]

but you are trying to store this: "01234556789abcdefghijklmnopqrstuvwxyz"

the result, therefore, is that the "stuff" starting with "123.." is being stored beyond the memory you allocated -- possibly writing over other "stuff" elsewhere. as such your results will be messy, and as "jidupont" said you're lucky that it doesn't just crash.

OUTPUT OF PRINTING [BROKEN] "p"

as said, you've written way past the end of "p"; but malloc doesn't know this. so when you asked for another block of memory for "q", maybe it gave you the memory following what it gave you for "p"; and maybe it aligned the memory (typical) so it's pointer is rounded up to some nice number; and then maybe it uses some of this memory to store bookkeeping information you're not supposed to be concerned with. but you don't know, do you? you're not supposed to know either -- you're just not supposed to write to memory that you haven't allocated yourself!

and the result? you see some of what you expected -- but it's truncated! because ... another block was perhaps allocated IN the memory you used (and used without permission, i might add), or something else owned that block and changed it, and in any case some values were changed -- resulting in: "01234556789abcdefghijklm!". again, lucky that things didn't just explode.

FREEING "q"

if you free "q", then try to access it -- as you are doing by trying to print it -- you will (usually) get a nasty error. this is well deserved. you shouldn't uncomment that "free(q)". but you also shouldn't try to print "q", because you haven't put anything there yet! for all you know, it might contain gibberish, and so print will continue until it encounters a NULL -- which may not happen until the end of the world -- or, more likely, until your program accesses yet more memory that it shouldn't, and crashes because the OS is not happy with you. :)



回答5:

It shouldn't be that puzzling that intentionally misusing these functions will give nonsensical results.

Two consecutive mallocs are not guaranteed to give you two consecutive areas of memory. malloc may choose to allocate more than the amount of memory you requested, but not less if the allocation succeeds. The behavior of your program when you choose to overwrite unallocated memory is not guaranteed to be predictable.

This is just the way C is. You can easily misuse the returned memory areas from malloc and the language doesn't care. It just assumes that in a correct program you will never do so, and everything else is up for grabs.



回答6:

Malloc is a function just like yours :)

There is a lot of malloc implementations so i won't go into useless details.

At the first call malloc it asks memory to the system. For the example let's say 4096 which is the standard memory page size which is good. So you call malloc asking for 1 byte. The function malloc will asks 4096 bytes to the system. Next, it will use a small part of this memory to store internal data such the positions of the available blocks. Then it will cut one part of this block and send it back to you.

An internal algorithm will trys to reuse the blocks after a call to free to avoid re-asking memory to the system.

So with this little explanation you can now understand why you code is working.

You are writing in the memory asked my malloc to the system. This comportment doesn't bother the system because you stay in the memory allocated for your processes. The problem is you can't know for sure that you are not writing on critical parts of your software memory. This kind off error are called buffer overflow and are causing most of the "mystical bugs".

The best way to avoid them is to use valgrind on linux. This soft will tell you if you are writing or reading where you are not supposed to.

It that clear enough ?



回答7:

I suggest reading this introduction.

Pointers And Memory

It helped me understand the difference between stack and heap allocation, very good introduction.



标签: c free malloc heap