Experimenting with some code and doing some microbenchmarks I just found out that using the float
function on a string containing an integer number is a factor 2 faster than using int
on the same string.
>>> python -m timeit int('1')
1000000 loops, best of 3: 0.548 usec per loop
>>> python -m timeit float('1')
1000000 loops, best of 3: 0.273 usec per loop
It gets even stranger when testing int(float('1'))
which runtime is shorter than the bare int('1')
.
>>> python -m timeit int(float('1'))
1000000 loops, best of 3: 0.457 usec per loop
I tested the code under Windows 7 running cPython 2.7.6 and Linux Mint 16 with cPython 2.7.6.
I have to add that only Python 2 is affected, Python 3 shows a way smaller (not remarkable) difference between the runtimes.
I know that the information I get by such microbenchmarks are easy to misuse, but I'm curious why there is such a difference in the functions' runtime.
I tried to find the implementations of int
and float
but I can not find it in the sources.
This is not a full answer, just some data and observations.
Profiling results from x86-64 Arch Linux, Python 2.7.14, on a 3.9GHz Skylake i7-6700k running Linux 4.15.8-1-ARCH.
float
: 0.0854 usec per loop.int
: 0.196 usec per loop. (So about a factor of 2)float
IDK why the heck Python is messing around with the x87 control word, but yes, the tiny
_Py_get_387controlword
function really runsfnstcw WORD PTR [rsp+0x6]
and then reloads it intoeax
as an integer return value withmovzx
, but probably spends more of its time writing and checking the stack canary from-fstack-protector-strong
.It's weird because
_Py_dg_strtod
uses SSE2 (cvtsi2sd xmm1,rsi
) for FP math, not x87. (The hot part with this input is mostly integer, but there aremulsd
anddivsd
in there.) x86-64 code normally only uses x87 forlong double
(80-bit float).dg_strtod
stands for David Gay's string to double. Interesting blog post about how it works under the hood.Note that this function only takes 9% of the total run time. The rest is basically interpreter overhead, compared to a C loop that called
strtod
in a loop and threw away the result.int
Notice that
PyEval_EvalFrameEx
takes 13% of the total time forint
, vs. 30% of the total forfloat
. That's about the same absolute time, andPyString_FromFormatV
is taking twice as much time. Plus more functions taking more small chunks of time.I haven't figured out what
PyInt_FromString
does, or what it's spending its time on. 7% of its cycle counts are charged to amovdqu xmm0, [rsi]
instruction near the start; i.e. loading a 16-byte arg that was passed by reference (as the 2nd function arg). This may be getting more counts than it deserves if whatever stored that memory was slow to produce it. (See this Q&A for more about how cycle counts get charge to instructions on out-of-order execution Intel CPUs where lots of different work is in flight every cycle.) Or maybe it's getting counts from a store-forwarding stall if that memory was written recently with separate narrower stores.It's surprising that
strlen
is taking so much time. From looking at the instruction profile within it, it's getting short strings, but not exclusively 1-byte strings. Looks like a mix oflen < 32
bytes and64 < len >= 32
bytes. Might be interesting to set a breakpoint in gdb and see what args are common.The float version has a
strchr
(maybe looking for a.
decimal point?), but nostrlen
of anything. It's surprising that theint
version has to redo astrlen
inside the loop at all.The actual
PyOS_strtoul
function takes 2% of the total time, run fromPyInt_FromString
(3% of the total time). These are "self" times, not including their children, so allocating memory and deciding on the number base is taking more time than parsing the single digit.An equivalent loop in C would run ~50x faster (or maybe 20x if we're generous), calling
strtoul
on a constant string and discarding the result.int with explicit base
For some reason this is as fast as
float
.The profile by function looks pretty similar to the
float
version, too.int()
has to account for more possible types to convert from thanfloat()
has to. When you pass a single object toint()
and it is not already an integer, then various things are tested for:__int__
method, call it and use the resultint
, reach in and convert the C integer value in the structure to anint()
object.__trunc__
method, call it and use the resultNone of these tests are executed when you pass in a base argument, the code then jumps straight to converting a string to an int, with the selected base. That’s because there are no other accepted types, not when there is a base given.
As a result, when you pass in a base, suddenly creating an integer from a string is a lot faster:
When you pass a string to
float()
, the first test made is to see if the argument is a string object (and not a subclass), at which point it is being parsed. There’s no need to test other types.So the
int('1')
call makes a few more tests thanint('1', 10)
orfloat('1')
. Of those tests, tests 1, 2, and 3 are quite fast; they are just pointer checks. But the fourth test uses the C equivalent ofgetattr(obj, '__trunc__')
, which is relatively expensive. This has to test the instance, and the full MRO of the string, and there is no cache, and in the end it raises anAttributeError()
, formatting an error message that no-one will ever see. All work that's pretty useless here.In Python 3, that
getattr()
call has been replaced with code that is a lot faster. That's because in Python 3, there is no need to account for old-style classes so the attribute can be looked up directly on the type of the instance (the class, the result oftype(instance)
), and class attribute lookups across the MRO are cached at this point. No exceptions need to be created.float()
objects implement the__int__
method, which is whyint(float('1'))
is faster; you never hit the__trunc__
attribute test at step 4 as step 2 produced the result instead.If you wanted to look at the C code, for Python 2, look at the
int_new()
method first. After parsing the arguments, the code essentially does this:The no-base case calls the
PyNumber_Int()
function, which does this:where
int_from_string()
is essentially a wrapper forPyInt_FromString(string, length, 10)
, so parsing the string with base 10.In Python 3,
intobject
was removed, leaving onlylongobject
, renamed toint()
on the Python side. In the same vein,unicode
has replacedstr
. So now we look atlong_new()
, and testing for a string is done withPyUnicode_Check()
instead ofPyString_Check()
:So again when no base is set, we need to look at
PyNumber_Long()
, which executes:Note the
_PyObject_LookupSpecial()
call, this is the special method lookup implementation; it eventually uses_PyType_Lookup()
, which uses a cache; since there is nostr.__trunc__
method that cache will forever return a null after the first MRO scan. This method also never raises an exception, it just returns either the requested method or a null.The way
float()
handles strings is unchanged between Python 2 and 3, so you only need to look at the Python 2float_new()
function, which for strings is pretty straightforward:So for string objects, we jump straight to parsing, otherwise use
PyNumber_Float()
to look for actualfloat
objects, or things with a__float__
method, or for string subclasses.This does reveal a possible optimisation: if
int()
were to first test forPyString_CheckExact()
before all those other type tests it would be just as fast asfloat()
when it comes to strings.PyString_CheckExact()
rules out a string subclass that has a__int__
or__trunc__
method so is a good first test.To address other answers blaming this on base parsing (so looking for a
0b
,0o
,0
or0x
prefix, case insensitively), the defaultint()
call with a single string argument does look for a base, the base is hardcoded to 10. It is an error to pass in a string with a prefix in that case:Base prefix parsing is only done if you explicitly set the second argument to
0
:Because no testing is done for
__trunc__
thebase=0
prefix parsing case is just as fast as settingbase
explicitly to any other supported value:int
has lots of bases.*, 0*, 0x*, 0b*, 0o* and it can be long, it takes time to determine the base and other things
if the base is set, it saves a lot of time
as @Martijn Pieters metions the code the
Object/intobject.c(int_new)
andObject/floatobject.c(float_new)