Something about the id
of objects of type str
(in python 2.7) puzzles me. The str
type is immutable, so I would expect that once it is created, it will always have the same id
. I believe I don't phrase myself so well, so instead I'll post an example of input and output sequence.
>>> id('so')
140614155123888
>>> id('so')
140614155123848
>>> id('so')
140614155123808
so in the meanwhile, it changes all the time. However, after having a variable pointing at that string, things change:
>>> so = 'so'
>>> id('so')
140614155123728
>>> so = 'so'
>>> id(so)
140614155123728
>>> not_so = 'so'
>>> id(not_so)
140614155123728
So it looks like it freezes the id, once a variable holds that value. Indeed, after del so
and del not_so
, the output of id('so')
start changing again.
This is not the same behaviour as with (small) integers.
I know there is not real connection between immutability and having the same id
; still, I am trying to figure out the source of this behaviour. I believe that someone whose familiar with python's internals would be less surprised than me, so I am trying to reach the same point...
Update
Trying the same with a different string gave different results...
>>> id('hello')
139978087896384
>>> id('hello')
139978087896384
>>> id('hello')
139978087896384
Now it is equal...
A more simplified way to understand the behaviour is to check the following Data Types and Variables.
Section "A String Pecularity" illustrates your question using special characters as example.
In your first example a new instance of the string
'so'
is created each time, hence different id.In the second example you are binding the string to a variable and Python can then maintain a shared copy of the string.
So while Python is not guaranteed to intern strings, it will frequently reuse the same string, and
is
may mislead. It's important to know that you shouldn't checkid
oris
for equality of strings.To demonstrate this, one way I've discovered to force a new string in Python 2.6 at least:
and here's a bit more Python exploration:
CPython does not promise to intern strings by default, but in practice, a lot of places in the Python codebase do reuse already-created string objects. A lot of Python internals use (the C-equivalent of) the
intern()
function call to explicitly intern Python strings, but unless you hit one of those special cases, two identical Python string literals will produce different strings.Python is also free to reuse memory locations, and Python will also optimize immutable literals by storing them once, at compile time, with the bytecode in code objects. The Python REPL (interactive interpreter) also stores the most recent expression result in the
_
name, which muddles up things some more.As such, you will see the same id crop up from time to time.
Running just the line
id(<string literal>)
in the REPL goes through several steps:The line is compiled, which includes creating a constant for the string object:
This shows the stored constants with the compiled bytecode; in this case a string
'foo'
and theNone
singleton.On execution, the string is loaded from the code constants, and
id()
returns the memory location. The resultingint
value is bound to_
, as well as printed:The code object is not referenced by anything, reference count drops to 0 and the code object is deleted. As a consequence, so is the string object.
Python can then perhaps reuse the same memory location for a new string object, if you re-run the same code. This usually leads to the same memory address being printed if you repeat this code. This does depend on what else you do with your Python memory.
ID reuse is not predictable; if in the meantime the garbage collector runs to clear circular references, other memory could be freed and you'll get new memory addresses.
Next, the Python compiler will also intern any Python string stored as a constant, provided it looks enough like a valid identifier. The Python code object factory function PyCode_New will intern any string object that contains only ASCII letters, digits or underscores:
Since you created strings that fit that criterion, they are interned, which is why you see the same ID being used for the
'so'
string in your second test: as long as a reference to the interned version survives, interning will cause future'so'
literals to reuse the interned string object, even in new code blocks and bound to different identifiers. In your first test, you don't save a reference to the string, so the interned strings are discarded before they can be reused.Incidentally, your new name
so = 'so'
binds a string to a name that contains the same characters. In other words, you are creating a global whose name and value are equal. As Python interns both identifiers and qualifying constants, you end up using the same string object for both the identifier and its value:If you create strings that are either not code object constants, or contain characters outside of the letters + numbers + underscore range, you'll see the
id()
value not being reused:The Python peephole optimizer does pre-calculate the results of simple expressions, but if this results in a sequence longer than 20 the output is ignored (to prevent bloating code objects and memory use); so concatenating shorter strings consisting only of name characters can still lead to interned strings if the result is 20 characters or shorter.
This behavior is specific to the Python interactive shell. If I put the following in a .py file:
and execute it, I receive the following output:
In CPython, a string literal is treated as a constant, which we can see in the bytecode of the snippet above:
The same constant (i.e. the same string object) is loaded 3 times, so the IDs are the same.