When playing around with the Python interpreter, I stumbled upon this conflicting case regarding the is
operator:
If the evaluation takes place in the function it returns True
, if it is done outside it returns False
.
>>> def func():
... a = 1000
... b = 1000
... return a is b
...
>>> a = 1000
>>> b = 1000
>>> a is b, func()
(False, True)
Since the is
operator evaluates the id()
's for the objects involved, this means that a
and b
point to the same int
instance when declared inside of function func
but, on the contrary, they point to a different object when outside of it.
Why is this so?
Note: I am aware of the difference between identity (is
) and equality (==
) operations as described in Understanding Python's "is" operator. In addition, I'm also aware about the caching that is being performed by python for the integers in range [-5, 256]
as described in "is" operator behaves unexpectedly with integers.
This isn't the case here since the numbers are outside that range and I do want to evaluate identity and not equality.
At the interactive prompt, entry are compiled in a single mode which processes one complete statement at a time. The compiler itself (in Python/compile.c) tracks the constants in a dictionary called u_consts that maps the constant object to its index.
In the compiler_add_o() function, you see that before adding a new constant (and incrementing the index), the dict is checked to see whether the constant object and index already exist. If so, they are reused.
In short, that means that repeated constants in one statement (such as in your function definition) are folded into one singleton. In contrast, your
a = 1000
andb = 1000
are two separate statements, so no folding takes place.FWIW, this is all just a CPython implementation detail (i.e. not guaranteed by the language). This is why the references given here are to the C source code rather than the language specification which makes no guarantees on the subject.
Hope you enjoyed this insight into how CPython works under the hood :-)
tl;dr:
As the reference manual states:
This is why, in the case of a function, you have a single code block which contains a single object for the numeric literal
1000
, soid(a) == id(b)
will yieldTrue
.In the second case, you have two distinct code objects each with their own different object for the literal
1000
soid(a) != id(b)
.Take note that this behavior doesn't manifest with
int
literals only, you'll get similar results with, for example,float
literals (see here).Of course, comparing objects (except for explicit
is None
tests ) should always be done with the equality operator==
and notis
.Everything stated here applies to the most popular implementation of Python, CPython. Other implementations might differ so no assumptions should be made when using them.
Longer Answer:
To get a little clearer view and additionally verify this seemingly odd behaviour we can look directly in the
code
objects for each of these cases using thedis
module.For the function
func
:Along with all other attributes, function objects also have a
__code__
attribute that allows you to peek into the compiled bytecode for that function. Usingdis.code_info
we can get a nice pretty view of all stored attributes in a code object for a given function:We're only interested in the
Constants
entry for functionfunc
. In it, we can see that we have two values,None
(always present) and1000
. We only have a single int instance that represents the constant1000
. This is the value thata
andb
are going to be assigned to when the function is invoked.Accessing this value is easy via
func.__code__.co_consts[1]
and so, another way to view oura is b
evaluation in the function would be like so:Which, of course, will evaluate to
True
because we're referring to the same object.For each interactive command:
As noted previously, each interactive command is interpreted as a single code block: parsed, compiled and evaluated independently.
We can get the code objects for each command via the
compile
built-in:For each assignment statement, we will get a similar looking code object which looks like the following:
The same command for
com2
looks the same but has a fundamental difference: each of the code objectscom1
andcom2
have different int instances representing the literal1000
. This is why, in this case, when we doa is b
via theco_consts
argument, we actually get:Which agrees with what we actually got.
Different code objects, different contents.
Note: I was somewhat curious as to how exactly this happens in the source code and after digging through it I believe I finally found it.
During compilations phase the
co_consts
attribute is represented by a dictionary object. Incompile.c
we can actually see the initialization:During compilation this is checked for already existing constants. See @Raymond Hettinger's answer below for a bit more on this.
Caveats:
Chained statements will evaluate to an identity check of
True
It should be more clear now why exactly the following evaluates to
True
:In this case, by chaining the two assignment commands together we tell the interpreter to compile these together. As in the case for the function object, only one object for the literal
1000
will be created resulting in aTrue
value when evaluated.Execution on a module level yields
True
again:As previously mentioned, the reference manual states that:
So the same premise applies: we will have a single code object (for the module) and so, as a result, single values stored for each different literal.
The same doesn't apply for mutable objects:
Meaning that unless we explicitly initialize to the same mutable object (for example with a = b = []), the identity of the objects will never be equal, for example:
Again, in the documentation, this is specified: