In a recent homework assignment I've been told to use long
variable to store a result, since it may be a big number.
I decided to check will it really matter for me, on my system (intel core i5/64-bit windows 7/gnu gcc compiler) and found out that the following code:
printf("sizeof(char) => %d\n", sizeof(char));
printf("sizeof(short) => %d\n", sizeof(short));
printf("sizeof(short int) => %d\n", sizeof(short int));
printf("sizeof(int) => %d\n", sizeof(int));
printf("sizeof(long) => %d\n", sizeof(long));
printf("sizeof(long int) => %d\n", sizeof(long int));
printf("sizeof(long long) => %d\n", sizeof(long long));
printf("sizeof(long long int) => %d\n", sizeof(long long int));
produces the following output:
sizeof(char) => 1
sizeof(short) => 2
sizeof(short int) => 2
sizeof(int) => 4
sizeof(long) => 4
sizeof(long int) => 4
sizeof(long long) => 8
sizeof(long long int) => 8
In other words, on my system, int
and long
are the same, and whatever will be too big for int
to hold, will be too big for long
to hold as well.
The homework assignment itself is not the issue here. I wonder how, on a system where int < long
, should I assign an int
to long?
I'm aware to the fact that there are numerous closely related questions on this subject, but I feel that the answers within these do not provide me with the complete understanding of what will or may happen in the process.
Basically I'm trying to figure out the following:
- Should I cast
long
to int
before the assignment, or since long
is not a different data type, but merely a modifier, it will be
considered unharmful to assign directly?
- What happens on systems where
long > int
? Will the result be
undefined (or unpredictable) or it will cause the extra parts of the
variable to be omitted?
- How does the casting from
long
to int
works in C?
- How does the assignment from
long
to int
works in C when I don't
use casting?
The language guarantees that int
is at least 16 bits, long
is at least 32 bits, and long
can represent at least all the values that int
can represent.
If you assign a long
value to an int
object, it will be implicitly converted. There's no need for an explicit cast; it would merely specify the same conversion that's going to happen anyway.
On your system, where int
and long
happen to have the same size and range, the conversion is trivial; it simply copies the value.
On a system where long
is wider than int
, if the value won't fit in an int
, then the result of the conversion is implementation-defined. (Or, starting in C99, it can raise an implementation-defined signal, but I don't know of any compilers that actually do that.) What typically happens is that the high-order bits are discarded, but you shouldn't depend on that. (The rules are different for unsigned types; the result of converting a signed or unsigned integer to an unsigned type is well defined.)
If you need to safely assign a long
value to an int
object, you can check that it will fit before doing the assignment:
#include <limits.h> /* for INT_MIN, INT_MAX */
/* ... */
int i;
long li = /* whatever */
if (li >= INT_MIN && li <= INT_MAX) {
i = li;
}
else {
/* do something else? */
}
The details of "something else" are going to depend on what you want to do.
One correction: int
and long
are always distinct types, even if they happen to have the same size and representation. Arithmetic types are freely convertible, so this often doesn't make any difference, but for example int*
and long*
are distinct and incompatible types; you can't assign a long*
to an int*
, or vice versa, without an explicit (and potentially dangerous) cast.
And if you find yourself needing to convert a long
value to int
, the first thing you should do is reconsider your code's design. Sometimes such conversions are necessary, but more often they're a sign that the int
to which you're assigning should have been defined as a long
in the first place.
A long
can always represent all values of int
.
If the value at hand can be represented by the type of the variable you assign to, then the value is preserved.
If it can't be represented, then for signed destination type the result is formally unspecified, while for unsigned destination type it is specified as the original value modulo 2n, where n is the number of bits in the value representation (which is not necessarily all the bits in the destination).
In practice, on modern machines you get wrapping also for signed types.
That's because modern machines use two's complement form to represent signed integers, without any bits used to denote "invalid value" or such – i.e., all bits used for value representation.
With n bits value representation any integer value is x is mapped to x+K*2n with the integer constant K chosen such that the result is in the range where half of the possible values are negative.
Thus, for example, with 32-bit int
the value -7 is represented as bitpattern number -7+232 = 232-7, so that if you display the number that the bitpattern stands for as unsigned integer, you get a pretty large number.
The reason that this is called two's complement is because it makes sense for the binary numeral system, the base two numeral system. For the binary numeral system there's also a ones' (note the placement of the apostrophe) complement. Similarly, for the decimal numberal system there's ten's complement and niners' complement. With 4 digit ten's complement representation you would represent -7 as 10000-7 = 9993. That's all, really.