I have this Program:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void main(void) {
char *buffer1 = malloc(sizeof(char));
char *buffer2 = malloc(sizeof(char));
strcpy(buffer2, "AA");
printf("before: buffer1 %s\n", buffer1);
printf("before: buffer2 %s\n", buffer2);
printf("address, buffer1 %p\n", &buffer1);
printf("address, buffer2 %p\n", &buffer2);
strcpy(buffer1, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB");
printf("after: buffer1 %s\n", buffer1);
printf("after: buffer2 %s\n", buffer2);
}
Which prints:
before: buffer1
before: buffer2 AA
address, buffer1 0x7ffc700460d8
address, buffer2 0x7ffc700460d0
after: buffer1 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
after: buffer2 B
What I expect this code to do:
As a char is 8 bits long, i expect that both buffers have the size of 1 byte/8 bits.
One ASCII char is 7 bits long, i expect that two characters fit into each buffer.
As I allocate two buffers of one byte directly after each other, i expect that they are directly next to each other in the memory. Therefore, i expect that the difference between each address is 1 (since the memory is addressed by byte?), and not 8 as my little program has printed.
As they are directly next to each other in the memory, i expect buffer 2 to be overflown with BB
when I do strcpy(buffer1, BBBB);
as the first BB
are written to buffer1
and the rest overflows to buffer2
. Therefore, i'd expect that strcpy(buffer1, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB");
produces:
an buffer overflow in buffer2
, so that it has the value BBBBBBBBBBBBBBBBBBBBBBBBBBBBB
or so.
- How I calculated that: the amonut of
B
which have been strcpy'd - 4 B
's for both buffers.
an segmentation fault. I have only allocated 2 bytes (since the size of buffer1
and buffer2
are together 2 bytes). Since BBBBBBBBBBBBBBBBBBBBBBBBB
doesn't fit into neither buffer1
nor buffer2
(because both are already filled), that would be overflown to the next memory buffer after buffer2
. And because i have not allocated that, i'd expect an segmentation fault.
Therefore, I want to ask: Why does my program act different than my expectations? Where did I misunderstand things?
I have a x86_64 architecture and the above program is compiled with gcc version 6.3.1 20170306 (GCC)
What I do not ask for:
- I know that
strcpy
is not bound checking and the usage is intentional. I want to investiage buffer overflows and such. Therefore, please don't write an answer/comment saying that i should use a different method as strcpy
.
- As a char is 8 bits long, ...
This is correct for the stated architecture and operating system. (The C standard allows char
to be more than 8 bits long, but this is very rare nowadays; the only example I know of is the TMS320 family of DSPs, where char
may be 16 bits. It's not allowed to be smaller.)
Note that sizeof(char) == 1
by definition and therefore it is generally considered bad style to write sizeof(char)
or foo * sizeof(char)
in your code.
... i expect that both buffers have the size of 1 byte/8 bits.
This is also correct (but see below).
- One ASCII char is 7 bits long, i expect that two characters fit into each buffer.
This is not correct, for two reasons. First, nobody uses 7-bit ASCII anymore. Each character is in fact eight bits long. Second, two seven-bit characters do not fit into one eight-bit buffer. I see that there is some confusion on this point in the comments on the question, so let me attempt to explain further: Seven bits can represent 27 different values, just enough room for the 128 different characters defined by the original ASCII standard. Two seven-bit characters, together, can have 128 * 128 = 16384 = 214 different values; that requires 14 bits to represent, and will not fit into eight bits. You seem to have thought it was only 2 * 128 = 28, which would fit into eight bits, but that's not right; it would mean that once you saw the first character, there were only two possibilities for the second character, not 128.
- As I allocate two buffers of one byte directly after each other, i expect that they are directly next to each other in the memory. Therefore, i expect that the difference between each address is 1 (since the memory is addressed by byte?), and not 8 as my little program has printed.
As you have observed for yourself, your expectations are incorrect.
malloc
is not required to put consecutive allocations next to each other; in fact, "are these allocations next to each other" may not be a meaningful question. The C standard goes out of its way to avoid requiring there to be any meaningful comparison between two pointers that don't point into the same array.
Now, you are working on a system with a "flat address space", so it is meaningful to compare pointers from successive allocations (provided you do it in your own brain, not with code) and there is a logical explanation for the gap between the allocations, but first I have to point out that you printed the wrong addresses:
printf("address, buffer1 %p\n", &buffer1);
printf("address, buffer2 %p\n", &buffer2);
This prints the addresses of the pointer variables, not the addresses of the buffers. You should have written
printf("address, buffer1 %p\n", (void *)buffer1);
printf("address, buffer2 %p\n", (void *)buffer2);
(The cast to void *
is required because printf
takes a variable argument list.) If you had written that you would have seen output similar to
address, buffer1 0x55583d9bb010
address, buffer2 0x55583d9bb030
and the important thing to notice is that these allocations differ by sixteen bytes, and not only that, they're both evenly divisible by 16.
malloc
is required to produce buffers that are aligned as required for any type, even if you can't fit a value of that type into the allocation. An address is aligned to some number of bytes if it's evenly divisible by that number. On your system, the maximum alignment requirement is 16; you can confirm this by running this program...
#include <stdalign.h>
#include <stddef.h>
#include <stdio.h>
int main(void) { printf("%zu\n", alignof(max_align_t)); return 0; }
So that means all addresses returned by malloc
must be evenly divisible by 16. Therefore, when you ask malloc
for two one-byte buffers, it has to leave a fifteen-byte gap between them. This does not mean that malloc
rounded the size up; the C standard specifically forbids you to access the bytes in the gap. (I'm not aware of any modern, commercial CPUs that can enforce that prohibition, but debugging tools like valgrind
will, and there have been experimental CPU designs that can do it. Also, often the space immediately before or after a malloc
block contains data used internally by the malloc
implementation, which you must not tamper with.)
There's a similar gap after the second allocation.
- As they are directly next to each other in the memory, i expect buffer 2 to be overflown with
BB
when I do strcpy(buffer1, BBBB);
as the first BB
are written to buffer1
and the rest overflows to buffer2
.
As previously discussed, they are not directly next to each other in memory, and each B takes up eight bits. One B is written to your first allocation, the next 15 to the gap between the two allocations, the 16th to the second allocation, 15 more after that to the gap after the second allocation, and the final one B and one NUL to the space beyond.
I have only allocated 2 bytes (since the size of buffer1
and buffer2
are together 2 bytes). Since BBBBBBBBBBBBBBBBBBBBBBBBB
doesn't fit into neither buffer1
nor buffer2
(because both are already filled), that would be overflown to the next memory buffer after buffer2
. And because i have not allocated that, i'd expect an segmentation fault.
We've already discussed why your calculations were incorrect, but you did write all the way past the end of the gap after the second allocation and into the "space beyond", so why no segfault? This is because, at the level of operating system primitives, memory is allocated to applications in units called "pages", which are larger than the amount of memory you asked for. The CPU can only detect a buffer overrun and trigger a segmentation fault if the overrun crosses a page boundary. You just didn't go far enough. I experimented with your program on my computer, which is very similar, and I need to write 132 kilobytes (a kilobyte is 1024 bytes) (some people say that that's supposed to be called a kibibyte; they are wrong) beyond the end of buffer1 to get a segfault. Pages on my computer are only 4 kilobytes each, but malloc
asks the OS for memory in even larger chunks because system calls are expensive.
Not getting a prompt segfault does not mean you are safe; there is an excellent chance you clobbered malloc
's internal data, or another allocation somewhere in the "space beyond". If I take your original program and add a call to free(buffer1)
at the end, it crashes in there.
First, please read What should main() return in C and C++?
Now focus on how you allocating memory.
How much memory does malloc(1) allocate?
8 bytes of overhead are added to our need for a single byte, and the
total is smaller than the minimum of 32, so that's our answer:
malloc(1) allocates 32 bytes.
which makes your basis soft.
Note: malloc(1)
allocates 32 bytes That may be true for the implemenation discussed on that link, but it is extremely implementation-dependent and will be differ.
On the other hand, if you had done:
char buffer1[1], buffer2[1];
instead of dynamically allocating memory, you would see different results. For example, in my system:
Georgioss-MacBook-Pro:~ gsamaras$ ./a.out // with malloc
before: buffer1
before: buffer2 AA
address, buffer1 0x7fff5ecb6bd8
address, buffer2 0x7fff5ecb6bd0
after: buffer1 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
after: buffer2 BBBBBBBBBBBBBBBBB
Georgioss-MacBook-Pro:~ gsamaras$ gcc -Wall main.c // no malloc
Georgioss-MacBook-Pro:~ gsamaras$ ./a.out
Abort trap: 6
Tip: The size has not officially been rounded up; accessing bytes beyond the requested size has Undefined Behavior. (If it were officially rounded up, this would have implementation-defined behavior.)
malloc
does not guarantee location in memory. You cannot be sure even with back to back calls that the memory space will be contiguous. In addition, malloc
often allocates more space than necessary. A segfault could well occur with your code, but is not guaranteed.
printf
with the %s
specifier prints characters from the pointer until a NUL
(ASCII 0) character is encountered.
Remember, buffer overflow is undefined behavior, which means just that: you do not know exactly what will happen.