I wrote a small program to check how many bytes char occupies in my memory and it shows char actually occupies 4 bytes in memory. I understand it's mostly because of word alignment and don't see advantage of a char being only 1 byte. Why not use 4 bytes for char?
int main(void)
{
int a;
char b;
int c;
a = 0;
b = 'b';
c = 1;
printf("%p\n",&a);
printf("%p\n",&b);
printf("%p\n",&c);
return 0;
}
Output:
0x7fff91a15c58
0x7fff91a15c5f
0x7fff91a15c54
Update:
I don't believe that malloc will allocate only 1 byte for char, even though sizeof(char) is passed as argument because, malloc contains a header will makes sure that header is word aligned. Any comments?
Update2:
If you are asked to effectively use memory without padding, is the only way is to create a special memory allocator? or is it possible to disable padding?
You have int, char, int
See the image here under "Why Restrict Byte Alignment?"
http://www.eventhelix.com/realtimemantra/ByteAlignmentAndOrdering.htm
Byte 0 Byte 1 Byte 2 Byte 3
0x1000
0x1004 X0 X1 X2 X3
0x1008
0x100C Y0 Y1 Y2
If it had stored them in 4-byte, 1-byte and 4-byte form, it would have taken 2 cpu cycles to retrieve int c
and some bit-shifting to get the actual value of c aligned properly for use as an int.
Alignment
Let's look at your output for printing the addresses of a, b, and c:
Output: 0x7fff91a15c58 0x7fff91a15c5f 0x7fff91a15c54
Notice that b isn't on the same 4 byte boundary? And that a and c are next to each other? Here is what it looks like in memory, with each row taking up 4 bytes, and the rightmost column being the 0th place:
| b | x | x | x | 0x5c5c
-----------------
| a | a | a | a | 0x5c58
-----------------
| c | c | c | c | 0x5c54
This is the compilers way of optimizing space and keeping things word aligned. Even though your address of b is 0x5c5f, it isn't actually taking up 4 bytes. If you take your same code and add a short d, you'll see this:
| b | x | d | d | 0x5c5c
-----------------
| a | a | a | a | 0x5c58
-----------------
| c | c | c | c | 0x5c54
Where the address of d is 0x5c5c. Shorts are going to be aligned to two bytes, so you will still have one byte of unused memory between c and d. Add in another char e, and you'll get:
| b | e | d | d | 0x5c5c
-----------------
| a | a | a | a | 0x5c58
-----------------
| c | c | c | c | 0x5c54
Here's my code and the output. Please note that my addresses will differ slightly, but it's the least significant digit in the address that we're really concerned about anyway:
int main(void)
{
int a;
char b;
int c;
short d;
char e;
a = 0;
b = 'b';
c = 1;
printf("%p\n",&a);
printf("%p\n",&b);
printf("%p\n",&c);
printf("%p\n",&d);
printf("%p\n",&e);
return 0;
}
$ ./a.out
0xbfa0bde8
0xbfa0bdef
0xbfa0bde4
0xbfa0bdec
0xbfa0bdee
Malloc
The man page of malloc says that it "allocates size bytes and returns a pointer to the allocated memory." It also says that it will "return a pointer to the allocated memory, which is suitably aligned for any kind of variable". From my testing, repeated calls to malloc(1) are returning addresses in "double word" increments, but I wouldn't count on this.
Caveats
My code was ran on an x86 32-bit machine. Other machines might vary slightly, and some compilers may optimize in different ways, but the ideas should hold true.
The variable itself doesn't occupy 4 bytes of memory, it occupies 1 byte, and is then followed by 3 bytes of padding, since the next variable on the stack is an int, and therefore has to be word aligned.
In a case like the one below, you will find that the address of variable anotherChar
is 1 byte larger than that of b
. They are then followed by 2 bytes of padding before int c
int main(void)
{
int a;
char b;
char anotherChar;
int c;
a = 0;
b = 'b';
c = 1;
printf("%p\n",&a);
printf("%p\n",&b);
printf("%p\n",&anotherChar);
printf("%p\n",&c);
return 0;
}
I'm assuming it has something to do with the packing of the variables in the stack. I believe in your example it's forcing the integers to be 4-byte aligned. Therefore there needs to be 3 bytes of padding before (or after) the char variable (depending on your compiler I suppose).
To answer the final part of your question: Why not use 4 bytes for char?
Why not use 4 million bytes for char[1000000]
?
This is due to alignment constraints. Size of character is 1 byte only , however the integer is being aligned to a multiple of4 bytes. Character can also be followed by other characters (or say short) which might have more lenient alignment constraints. In these cases if size of char was 4 bytes indeed as you suggest, we will consume more space than necessary.