Why is float() faster than int()?

Experimenting with some code and doing some microbenchmarks I just found out that using the float function on a string containing an integer number is a factor 2 faster than using int on the same string.

>>> python -m timeit int('1')
1000000 loops, best of 3: 0.548 usec per loop

>>> python -m timeit float('1')
1000000 loops, best of 3: 0.273 usec per loop

It gets even stranger when testing int(float('1')) which runtime is shorter than the bare int('1').

>>> python -m timeit int(float('1'))
1000000 loops, best of 3: 0.457 usec per loop

I tested the code under Windows 7 running cPython 2.7.6 and Linux Mint 16 with cPython 2.7.6.

I have to add that only Python 2 is affected, Python 3 shows a way smaller (not remarkable) difference between the runtimes.

I know that the information I get by such microbenchmarks are easy to misuse, but I'm curious why there is such a difference in the functions' runtime.

I tried to find the implementations of int and float but I can not find it in the sources.

标签： python python-2.7 performance python-internals

3条回答

狗以群分

2楼-- · 2019-01-14 01:40

This is not a full answer, just some data and observations.

Profiling results from x86-64 Arch Linux, Python 2.7.14, on a 3.9GHz Skylake i7-6700k running Linux 4.15.8-1-ARCH. float: 0.0854 usec per loop. int: 0.196 usec per loop. (So about a factor of 2)

float

$ perf record python2.7 -m timeit 'float("1")'
10000000 loops, best of 3: 0.0854 usec per loop

Samples: 14K of event 'cycles:uppp', Event count (approx.): 13685905532
Overhead  Command    Shared Object        Symbol
  29.73%  python2.7  libpython2.7.so.1.0  [.] PyEval_EvalFrameEx
   8.54%  python2.7  libpython2.7.so.1.0  [.] _Py_dg_strtod
   8.30%  python2.7  libpython2.7.so.1.0  [.] vgetargskeywords
   5.81%  python2.7  libpython2.7.so.1.0  [.] lookdict_string.lto_priv.1492
   4.79%  python2.7  libpython2.7.so.1.0  [.] PyFloat_FromString
   4.67%  python2.7  libpython2.7.so.1.0  [.] tupledealloc.lto_priv.335
   4.16%  python2.7  libpython2.7.so.1.0  [.] float_new.lto_priv.219
   3.93%  python2.7  libpython2.7.so.1.0  [.] _PyOS_ascii_strtod
   3.54%  python2.7  libc-2.26.so         [.] __strchr_avx2
   3.34%  python2.7  libpython2.7.so.1.0  [.] PyOS_string_to_double
   3.21%  python2.7  libpython2.7.so.1.0  [.] PyTuple_New
   3.05%  python2.7  libpython2.7.so.1.0  [.] type_call.lto_priv.51
   2.69%  python2.7  libpython2.7.so.1.0  [.] PyObject_Call
   2.15%  python2.7  libpython2.7.so.1.0  [.] PyArg_ParseTupleAndKeywords
   1.88%  python2.7  itertools.so         [.] _init
   1.78%  python2.7  libpython2.7.so.1.0  [.] _Py_set_387controlword
   1.19%  python2.7  libpython2.7.so.1.0  [.] _Py_get_387controlword
   1.10%  python2.7  libpython2.7.so.1.0  [.] vgetargskeywords.cold.59
   1.07%  python2.7  libpython2.7.so.1.0  [.] PyType_IsSubtype
   1.07%  python2.7  libc-2.26.so         [.] __memset_avx2_unaligned_erms
   ...

IDK why the heck Python is messing around with the x87 control word, but yes, the tiny _Py_get_387controlword function really runs fnstcw WORD PTR [rsp+0x6] and then reloads it into eax as an integer return value with movzx, but probably spends more of its time writing and checking the stack canary from -fstack-protector-strong.

It's weird because _Py_dg_strtod uses SSE2 (cvtsi2sd xmm1,rsi) for FP math, not x87. (The hot part with this input is mostly integer, but there are mulsd and divsd in there.) x86-64 code normally only uses x87 for long double (80-bit float). dg_strtod stands for David Gay's string to double. Interesting blog post about how it works under the hood.

Note that this function only takes 9% of the total run time. The rest is basically interpreter overhead, compared to a C loop that called strtod in a loop and threw away the result.

int

$ perf record python2.7 -m timeit 'int("1")'
10000000 loops, best of 3: 0.196 usec per loop

$ perf report -Mintel
Samples: 32K of event 'cycles:uppp', Event count (approx.): 31257616633
Overhead  Command    Shared Object        Symbol
  29.00%  python2.7  libpython2.7.so.1.0  [.] PyString_FromFormatV
  13.11%  python2.7  libpython2.7.so.1.0  [.] PyEval_EvalFrameEx
   5.49%  python2.7  libc-2.26.so         [.] __strlen_avx2
   3.87%  python2.7  libpython2.7.so.1.0  [.] vgetargskeywords
   3.68%  python2.7  libpython2.7.so.1.0  [.] PyNumber_Int
   3.10%  python2.7  libpython2.7.so.1.0  [.] PyInt_FromString
   2.75%  python2.7  libpython2.7.so.1.0  [.] PyErr_Restore
   2.68%  python2.7  libc-2.26.so         [.] __strchr_avx2
   2.41%  python2.7  libpython2.7.so.1.0  [.] tupledealloc.lto_priv.335
   2.10%  python2.7  libpython2.7.so.1.0  [.] PyObject_Call
   2.00%  python2.7  libpython2.7.so.1.0  [.] PyOS_strtoul
   1.93%  python2.7  libpython2.7.so.1.0  [.] lookdict_string.lto_priv.1492
   1.87%  python2.7  libpython2.7.so.1.0  [.] _PyObject_GenericGetAttrWithDict
   1.73%  python2.7  libpython2.7.so.1.0  [.] PyString_FromStringAndSize
   1.71%  python2.7  libc-2.26.so         [.] __memmove_avx_unaligned_erms
   1.67%  python2.7  libpython2.7.so.1.0  [.] PyTuple_New
   1.63%  python2.7  libpython2.7.so.1.0  [.] PyObject_Malloc
   1.48%  python2.7  libpython2.7.so.1.0  [.] int_new.lto_priv.68
   1.45%  python2.7  libpython2.7.so.1.0  [.] PyErr_Format
   1.45%  python2.7  libpython2.7.so.1.0  [.] PyObject_Realloc
   1.37%  python2.7  libpython2.7.so.1.0  [.] type_call.lto_priv.51
   1.30%  python2.7  libpython2.7.so.1.0  [.] PyOS_strtol
   1.23%  python2.7  libpython2.7.so.1.0  [.] _PyString_Resize
   1.16%  python2.7  libc-2.26.so         [.] __ctype_b_loc
   1.11%  python2.7  libpython2.7.so.1.0  [.] _PyType_Lookup
   1.06%  python2.7  libpython2.7.so.1.0  [.] PyString_AsString
   1.04%  python2.7  libpython2.7.so.1.0  [.] PyArg_ParseTupleAndKeywords
   1.02%  python2.7  libpython2.7.so.1.0  [.] PyObject_Free
   0.93%  python2.7  libpython2.7.so.1.0  [.] PyInt_FromLong
   0.90%  python2.7  libpython2.7.so.1.0  [.] PyObject_GetAttr
   0.52%  python2.7  libc-2.26.so         [.] __memset_avx2_unaligned_erms
   0.52%  python2.7  libpython2.7.so.1.0  [.] vgetargskeywords.cold.59
   0.48%  python2.7  itertools.so         [.] _init
   ...

Notice that PyEval_EvalFrameEx takes 13% of the total time for int, vs. 30% of the total for float. That's about the same absolute time, and PyString_FromFormatV is taking twice as much time. Plus more functions taking more small chunks of time.

I haven't figured out what PyInt_FromString does, or what it's spending its time on. 7% of its cycle counts are charged to a movdqu xmm0, [rsi] instruction near the start; i.e. loading a 16-byte arg that was passed by reference (as the 2nd function arg). This may be getting more counts than it deserves if whatever stored that memory was slow to produce it. (See this Q&A for more about how cycle counts get charge to instructions on out-of-order execution Intel CPUs where lots of different work is in flight every cycle.) Or maybe it's getting counts from a store-forwarding stall if that memory was written recently with separate narrower stores.

It's surprising that strlen is taking so much time. From looking at the instruction profile within it, it's getting short strings, but not exclusively 1-byte strings. Looks like a mix of len < 32 bytes and 64 < len >= 32 bytes. Might be interesting to set a breakpoint in gdb and see what args are common.

The float version has a strchr (maybe looking for a . decimal point?), but no strlen of anything. It's surprising that the int version has to redo a strlen inside the loop at all.

The actual PyOS_strtoul function takes 2% of the total time, run from PyInt_FromString (3% of the total time). These are "self" times, not including their children, so allocating memory and deciding on the number base is taking more time than parsing the single digit.

An equivalent loop in C would run ~50x faster (or maybe 20x if we're generous), calling strtoul on a constant string and discarding the result.

int with explicit base

For some reason this is as fast as float.

$ perf record python2.7 -m timeit 'int("1",10)'
10000000 loops, best of 3: 0.0894 usec per loop

$ perf report -Mintel
Samples: 14K of event 'cycles:uppp', Event count (approx.): 14289699408
Overhead  Command    Shared Object        Symbol
  30.84%  python2.7  libpython2.7.so.1.0  [.] PyEval_EvalFrameEx
  12.56%  python2.7  libpython2.7.so.1.0  [.] vgetargskeywords
   6.70%  python2.7  libpython2.7.so.1.0  [.] PyInt_FromString
   5.19%  python2.7  libpython2.7.so.1.0  [.] tupledealloc.lto_priv.335
   5.17%  python2.7  libpython2.7.so.1.0  [.] int_new.lto_priv.68
   4.12%  python2.7  libpython2.7.so.1.0  [.] lookdict_string.lto_priv.1492
   4.08%  python2.7  libpython2.7.so.1.0  [.] PyOS_strtoul
   3.78%  python2.7  libc-2.26.so         [.] __strchr_avx2
   3.29%  python2.7  libpython2.7.so.1.0  [.] type_call.lto_priv.51
   3.26%  python2.7  libpython2.7.so.1.0  [.] PyTuple_New
   3.09%  python2.7  libpython2.7.so.1.0  [.] PyOS_strtol
   3.06%  python2.7  libpython2.7.so.1.0  [.] PyObject_Call
   2.49%  python2.7  libpython2.7.so.1.0  [.] PyArg_ParseTupleAndKeywords
   2.01%  python2.7  libpython2.7.so.1.0  [.] PyType_IsSubtype
   1.65%  python2.7  libc-2.26.so         [.] __strlen_avx2
   1.52%  python2.7  libpython2.7.so.1.0  [.] object_init.lto_priv.86
   1.19%  python2.7  libpython2.7.so.1.0  [.] vgetargskeywords.cold.59
   1.03%  python2.7  libpython2.7.so.1.0  [.] PyInt_AsLong
   1.00%  python2.7  libpython2.7.so.1.0  [.] PyString_Size
   0.99%  python2.7  libpython2.7.so.1.0  [.] PyObject_GC_UnTrack
   0.87%  python2.7  libc-2.26.so         [.] __ctype_b_loc
   0.85%  python2.7  libc-2.26.so         [.] __memset_avx2_unaligned_erms
   0.47%  python2.7  itertools.so         [.] _init

The profile by function looks pretty similar to the float version, too.

0人赞添加讨论(0) 举报

祖国的老花朵

3楼-- · 2019-01-14 01:50

int() has to account for more possible types to convert from than float() has to. When you pass a single object to int() and it is not already an integer, then various things are tested for:

if it is an integer already, use it directly
if the object implements the __int__ method, call it and use the result
if the object is a C-derived subclass of int, reach in and convert the C integer value in the structure to an int() object.
if the object implements the __trunc__ method, call it and use the result
if the object is a string, convert it to an integer with the base set to 10.

None of these tests are executed when you pass in a base argument, the code then jumps straight to converting a string to an int, with the selected base. That’s because there are no other accepted types, not when there is a base given.

As a result, when you pass in a base, suddenly creating an integer from a string is a lot faster:

$ bin/python -m timeit "int('1')"
1000000 loops, best of 3: 0.469 usec per loop
$ bin/python -m timeit "int('1', 10)"
1000000 loops, best of 3: 0.277 usec per loop
$ bin/python -m timeit "float('1')"
1000000 loops, best of 3: 0.206 usec per loop

When you pass a string to float(), the first test made is to see if the argument is a string object (and not a subclass), at which point it is being parsed. There’s no need to test other types.

So the int('1') call makes a few more tests than int('1', 10) or float('1'). Of those tests, tests 1, 2, and 3 are quite fast; they are just pointer checks. But the fourth test uses the C equivalent of getattr(obj, '__trunc__'), which is relatively expensive. This has to test the instance, and the full MRO of the string, and there is no cache, and in the end it raises an AttributeError(), formatting an error message that no-one will ever see. All work that's pretty useless here.

In Python 3, that getattr() call has been replaced with code that is a lot faster. That's because in Python 3, there is no need to account for old-style classes so the attribute can be looked up directly on the type of the instance (the class, the result of type(instance)), and class attribute lookups across the MRO are cached at this point. No exceptions need to be created.

float() objects implement the __int__ method, which is why int(float('1')) is faster; you never hit the __trunc__ attribute test at step 4 as step 2 produced the result instead.

If you wanted to look at the C code, for Python 2, look at the int_new() method first. After parsing the arguments, the code essentially does this:

if (base == -909)  // no base argument given, the default is -909
    return PyNumber_Int(x);  // parse an integer from x, an arbitrary type. 
if (PyString_Check(x)) {
    // do some error handling; there is a base, so parse the string with the base
    return PyInt_FromString(string, NULL, base);
}

The no-base case calls the PyNumber_Int() function, which does this:

if (PyInt_CheckExact(o)) {
    // 1. it's an integer already
    // ...
}
m = o->ob_type->tp_as_number;
if (m && m->nb_int) { /* This should include subclasses of int */
    // 2. it has an __int__ method, return the result
    // ...
}
if (PyInt_Check(o)) { /* An int subclass without nb_int */
    // 3. it's an int subclass, extract the value
    // ...
}
trunc_func = PyObject_GetAttr(o, trunc_name);
if (trunc_func) {
    // 4. it has a __trunc__ method, call it and process the result
    // ...
}
if (PyString_Check(o))
    // 5. it's a string, lets parse!
    return int_from_string(PyString_AS_STRING(o),
                           PyString_GET_SIZE(o));

where int_from_string() is essentially a wrapper for PyInt_FromString(string, length, 10), so parsing the string with base 10.

In Python 3, intobject was removed, leaving only longobject, renamed to int() on the Python side. In the same vein, unicode has replaced str. So now we look at long_new(), and testing for a string is done with PyUnicode_Check() instead of PyString_Check():

if (obase == NULL)
    return PyNumber_Long(x);

// bounds checks on the obase argument, storing a conversion in base

if (PyUnicode_Check(x))
    return PyLong_FromUnicodeObject(x, (int)base);

So again when no base is set, we need to look at PyNumber_Long(), which executes:

if (PyLong_CheckExact(o)) {
    // 1. it's an integer already
    // ...
}
m = o->ob_type->tp_as_number;
if (m && m->nb_int) { /* This should include subclasses of int */
    // 2. it has an __int__ method
    // ...
}
trunc_func = _PyObject_LookupSpecial(o, &PyId___trunc__);
if (trunc_func) {
    // 3. it has a __trunc__ method
    // ...
}
if (PyUnicode_Check(o))
    // 5. it's a string
    return PyLong_FromUnicodeObject(o, 10);

Note the _PyObject_LookupSpecial() call, this is the special method lookup implementation; it eventually uses _PyType_Lookup(), which uses a cache; since there is no str.__trunc__ method that cache will forever return a null after the first MRO scan. This method also never raises an exception, it just returns either the requested method or a null.

The way float() handles strings is unchanged between Python 2 and 3, so you only need to look at the Python 2 float_new() function, which for strings is pretty straightforward:

// test for subclass and retrieve the single x argument
/* If it's a string, but not a string subclass, use
   PyFloat_FromString. */
if (PyString_CheckExact(x))
    return PyFloat_FromString(x, NULL);
return PyNumber_Float(x);

So for string objects, we jump straight to parsing, otherwise use PyNumber_Float() to look for actual float objects, or things with a __float__ method, or for string subclasses.

This does reveal a possible optimisation: if int() were to first test for PyString_CheckExact() before all those other type tests it would be just as fast as float() when it comes to strings. PyString_CheckExact() rules out a string subclass that has a __int__ or __trunc__ method so is a good first test.

To address other answers blaming this on base parsing (so looking for a 0b, 0o, 0 or 0x prefix, case insensitively), the default int() call with a single string argument does look for a base, the base is hardcoded to 10. It is an error to pass in a string with a prefix in that case:

>>> int('0x1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '0x1'

Base prefix parsing is only done if you explicitly set the second argument to 0:

>>> int('0x1', 0)
1

Because no testing is done for __trunc__ the base=0 prefix parsing case is just as fast as setting base explicitly to any other supported value:

$ python2.7 -m timeit "int('1')"
1000000 loops, best of 3: 0.472 usec per loop
$ python2.7 -m timeit "int('1', 10)"
1000000 loops, best of 3: 0.268 usec per loop
$ python2.7 bin/python -m timeit "int('1', 0)"
1000000 loops, best of 3: 0.271 usec per loop
$ python2.7 bin/python -m timeit "int('0x1', 0)"
1000000 loops, best of 3: 0.261 usec per loop

0人赞添加讨论(0) 举报

虎瘦雄心在

4楼-- · 2019-01-14 01:56

int has lots of bases.

*, 0*, 0x*, 0b*, 0o* and it can be long, it takes time to determine the base and other things

if the base is set, it saves a lot of time

python -m timeit "int('1',10)"       
1000000 loops, best of 3: 0.252 usec per loop

python -m timeit "int('1')"   
1000000 loops, best of 3: 0.594 usec per loop

as @Martijn Pieters metions the code the Object/intobject.c(int_new) and Object/floatobject.c(float_new)

0人赞添加讨论(0) 举报