This is a program I ran:
#include <stdio.h>
int main(void)
{
int y = 1234;
char *p = &y;
int *j = &y;
printf("%d %d\n", *p, *j);
}
I am slightly confused about the output. What I'm seeing is:
-46 1234
I wrote this program as an experiment and wasn't sure what it was going to output. I was expecting possibly one byte from y
.
What is happening "behind-the-scenes" here? How does dereferencing p
give me -46
?
As pointed out by others, I had to do explicit casting to not cause UB. I am not changing that line from char *p = &y;
to char *p = (char *)&y;
so that I am not invalidating the answers below.
This program is not causing any UB behaviour as pointed here.
There are a couple of issues with the code as written.
First of all, you are invoking undefined behavior by trying to print the numeric representation of a
char
object using the%d
conversion specifier:Online C 2011 draft, §7.21.6.1, subclause 9:
Yes, objects of type
char
are promoted toint
when passed to variadic functions;printf
is special, and if you want the output to be well-defined, then the type of the argument and the conversion specifier must match up. To print the numeric value of achar
with%d
orunsigned char
argument with%u
,%o
, or%x
, you must use thehh
length modifier as part of the conversion spec:The second issue is that the line
is a constraint violation -
char *
andint *
are not compatible types, and may have different sizes and/or representations2. Thus, you must explicitly cast the source to the target type:The one exception to this rule occurs when one of the operands is
void *
; then the cast isn't necessary.Having said all that, I took your code and added a utility that dumps the address and contents of objects in the program. Here's what
y
,p
, andj
look like on my system (SLES-10, gcc 4.1.2):I'm on an x86 system, which is little-endian, so it stores multi-byte objects starting with the least-significant byte at the lowest address:
On a little-endian system, the addressed byte is the least-significant byte, which in this case is
0xd2
(210
unsigned,-46
signed).In a nutshell, you're printing the signed, decimal representation of that single byte.
As for the broader question, the type of the expression
*p
ischar
and the type of the expression*j
isint
; the compiler simply goes by the type of the expression. The compiler keeps track of all objects, expressions, and types as it translates your source to machine code. So when it sees the expression*j
, it knows that it's dealing with an integer value and generates machine code appropriately. When it sees the expression*p
, it knows it's dealing with achar
value.If you have something like ,
If You Dereference Pointer
p
then it will correctly read integer bytes. Because You declared it to be pointer toint
. It will know how many bytes to read bysizeof()
operator. Generally size ofint
is4 bytes
(for 32/64-bit platforms) but it is machine dependent that is why it will usesizeof()
operator to know correct size and will read so.For Your Code
Now
pointer p
points toy
but we have declared it to be pointer to achar
so it will only read one byte or whatever byte char is of .1234
in binary would be represented as00000000 00000000 00000100 11010010
Now if your machine is little endian it will store the bytes reversing them
11010010 00000100 00000000 00000000
11010010
is ataddress 00
Hypothetical address
,00000100
is ataddress 01
and so on.So now if you dereference
pointer p
it will read only first byte and output will be (-46
in case ofsigned char
and210
in case ofunsigned char
, according to the C standard the signed-ness of plain char is "implementation defined.) as Byte read would be11010010
(because we pointedsigned char
(in this case it issigned char
).On your PC negative numbers are represented as 2's Complement so the
most-significant bit
is the sign bit. First bit1
denotes the sign.11010010 = –128 + 64 + 16 + 2 = –46
and if you dereferencepointer j
it will completely read all bytes ofint
as we declared it to be pointer toint
and output will be1234
If you declare pointer j as
int *j
then*j
will readsizeof(int)
here 4 bytes(machine dependent). Same goes withchar
or any other data type the pointer pointed to them will read as many bytes there size is of ,char
is of 1 byte.As others have pointed, you need to explicitly cast to
char*
aschar *p = &y;
is a constraint violation -char *
andint *
are not compatible types, instead writechar *p = (char *)&y
.(Please note this answer refers to the original form of the question, which asked how the program knew how many bytes to read, etc. I'm keeping it around on that basis, despite the rug having been pulled out from under it.)
A pointer refers to a location in memory that contains a particular object and must be incremented/decremented/indexed with a particular stride size, reflecting the
sizeof
the pointed type.The observable value of the pointer itself (e.g. through
std::cout << ptr
) need not reflect any recognisable physical address, nor does++ptr
need to increment said value by 1,sizeof(*ptr)
, or anything else. A pointer is just a handle to an object, with an implementation-defined bit representation. That representation doesn't and shouldn't matter to users. The only thing for which users should use the pointer is to... well, point to stuff. Talk of its address is nonportable and only useful in debugging.Anyway, simply, the compiler knows how many bytes to read/write because the pointer is typed, and that type has a defined
sizeof
, representation, and mapping to physical addresses. So, based on that type, operations onptr
will be compiled to appropriate instructions in order to calculate the real hardware address (which again, need not correspond to the observable value ofptr
), read the rightsizeof
number of memory 'bytes', add/subtract the right number of bytes so it points at the next object, etc.First read the warning which says warning: initialization from incompatible pointer type [enabled by default] char *p = &y;
which means you should do explicit typecasting to avoid undefined behaviour according to standard §7.21.6.1, subclause 9 (pointed by @john Bode) as
and
here
y
is thelocal variable
and it will be stores in thestack
section ofRAM
.In Linux machine integers are stored in memory according tolittle endian
format. Assume4 bytes
of memory reserved fory
is from0x100
to0x104
As pointed above,
j
andp
both points to same address0x100
but when compiler will perform*p
sincep
issigned character pointer
by default it will checksign bit
and heresign bit
is1
means one thing is sure that output it's going to print is negative number.If
sign bit
is1
i.e negative number and Negative numbers are stored in Memory as 2's compliment Sowhile printing if you are using
%u
format specifier which is for printingunsigned
equivalent, it willnot
checksign bi
t, finally whatever data is there in1 byte
gets printed.finally
In above statement while doing dereferencing
j
which issigned pointer
by default and its aint
pointer so it will check 31st bit for sign, which is0
means output will bepositive
no and that is 1234.