-->

Why are Python Ref Counts to small integers surpri

2019-03-26 07:13发布

问题:

In this answer I found a way to get a reference count of objects in Python.

They mentioned using sys.getrefcount(). I tried it, but I'm getting an unexpected result. When there is 1 reference, it seems as though the count is 20. Why is that?

I looked at the documentation but it doesn't seem to explain the reason.

回答1:

This object happens to have 20 references to it at the time of the first sys.getrefcount call. It's not just the references you created; there are all sorts of other references to it in other modules and in the Python internals, since (this is an implementation detail) the standard implementation of Python only creates one 100 object and uses it for all occurrences of 100 in a Python program.



回答2:

There a bunch of reasons for having many references to an object. Tracking down which one can be difficult, and deciding if it is worth it may bypass your level of interest. Reference counting is of primary interest to developers of debugging applications and python variants.

  • Python tries to keep only one actual value to each reference. So, the 100 you have in your example would be the same 100 that is some internal limit on recursion calls or the same 100 as a current loop index.
  • Python keeps extra references to some common objects, including low integers. The reference count to 1,234,567 is different than the count to 20.
  • Many functions memoize, and keep references, to recent arguments.
  • Some interpreters keep references to recent values and values referenced on recent lines. For example, the previous return value is stored in "_". This means running in the interpreter and running from the command line will give different answers.
  • Like all reference counting schemes, there are mistakes. For example, PyTuple_GetItem() has had some questionable choices.

The exact reference counts and meanings of those counts will be different in PyPi versus C-Python versus IPython. Reference counting is rarely a good tool for finding odd behavior in Python.



回答3:

It's fun to read the source code for Python2.7 which is very well written and clear to read. (I'm referring to version 2.7.12 if you want to play along at home.) A good place to start in understanding the code is the excellent series of lectures: C Python Internals which starts from a beginner's perspective.

The critical code (written in C) relevant to us appears in the file 'Objects/intobject.c' (I've removed some #ifdef code and slightly modified the creation of a new Integer object for clarity):

    #define NSMALLPOSINTS           257
    #define NSMALLNEGINTS           5
    static PyIntObject *small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

    PyObject *
    PyInt_FromLong(long ival)
    {
        register PyIntObject *v;
        if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) {
            v = small_ints[ival + NSMALLNEGINTS];
            Py_INCREF(v);
            return (PyObject *) v;
        }
        /* Inline PyObject_New */
        v = (PyIntObject *)Py_TYPE(v);
        PyObject_INIT(v, &PyInt_Type);
        v->ob_ival = ival;
        return (PyObject *) v;
    }

So essentially it creates a preset array containing all the numbers from -5 to 256 inclusive and uses those objects (increasing their reference counts using the Py_INCREF macro) if possible. If not, it will create a brand new PyInt_Type object, which is initialised with a reference count of 1.

The mystery of why every number seems to have a reference count of 3 (actually almost any new object) is only revealed when you look at the byte code that Python generates. The Virtual Machine operates with a value stack (a bit like in Forth), and each time an object is put on the value stack it increments the reference count.

So what I suspect is happening is that your code is itself providing all 3 of the references you see, since for numbers not in the list of small values, you should get a unique object. The first reference is presumably in the value stack for the caller of getrefcount when it makes the call; the second is in the local variable list for the getrefcount frame; the third is likely on the value stack in the getrefcount frame while it looks up its reference count.

A useful tool if you want to dig further into the issue are the 'compile' command, and the 'dis' command (disassemble) which is in the 'dis' module, which will together allow you to read the actual byte code generated by any piece of Python code, and should help you find out exactly when and where the third reference is being created.

As for the higher reference counts for small values, when you start Python, it automatically loads the entire Standard Library and runs quite a lot of Python module initialisation code before you get to start interpreting your own code. These modules are holding their own copies many of the small integers (and the None object which is also unique).