Why is the implementation of startwith
slower than slicing?
In [1]: x = 'foobar'
In [2]: y = 'foo'
In [3]: %timeit x.startswith(y)
1000000 loops, best of 3: 321 ns per loop
In [4]: %timeit x[:3] == y
10000000 loops, best of 3: 164 ns per loop
Surprisingly, even including calculation for the length, slicing still appears significantly faster:
In [5]: %timeit x[:len(y)] == y
1000000 loops, best of 3: 251 ns per loop
Note: the first part of this behaviour is noted in Python for Data Analysis (Chapter 3), but no explanation for it is offered.
.
If helpful: here is the C code for startswith
; and here is the output of dis.dis
:
In [6]: import dis
In [7]: dis_it = lambda x: dis.dis(compile(x, '<none>', 'eval'))
In [8]: dis_it('x[:3]==y')
1 0 LOAD_NAME 0 (x)
3 LOAD_CONST 0 (3)
6 SLICE+2
7 LOAD_NAME 1 (y)
10 COMPARE_OP 2 (==)
13 RETURN_VALUE
In [9]: dis_it('x.startswith(y)')
1 0 LOAD_NAME 0 (x)
3 LOAD_ATTR 1 (startswith)
6 LOAD_NAME 2 (y)
9 CALL_FUNCTION 1
12 RETURN_VALUE
startswith
is more complex than slicing...This isn't a simple character compare loop for needle in beginning of haystack that's happening. We're looking at a for loop that is iterating through a vector/tuple (subobj) and calling another function (
_string_tailmatch
) on it. Multiple function calls have overhead with regards to the stack, argument sanity checks etc...startswith
is a library function while the slicing appears to be built into the language.Some of the performance difference can be explained by taking into account the time it takes the
.
operator to do its thing:Another portion of the difference can be explained by the fact that
startswith
is a function, and even no-op function calls take a bit of time:This does not totally explain the difference, since the version using slicing and
len
calls a function and is still faster (compare tosw(y)
above -- 267 ns):My only guess here is that maybe Python optimizes lookup time for built-in functions, or that
len
calls are heavily optimized (which is probably true). It might be possible to test that with a customlen
func. Or possibly this is where the differences identified by LastCoder kick in. Note also larsmans' results, which indicate thatstartswith
is actually faster for longer strings. The whole line of reasoning above applies only to those cases where the overhead I'm talking about actually matters.Calling a function is quite expensive. I don't know, however, if this is the case as well for built-in functions written in C.
Be aware, though, that slicing may involve a function call as well depending on the object which is used.
To quote the docs,
startswith
does more you might think:The comparison isn't fair since you're only measuring the case where
startswith
returnsTrue
.Also, for much longer strings,
startswith
is a lot faster:This is still true when there's no match.
So,
startswith
is probably slower for short strings because it's optimized for long ones.(Trick to get random strings taken from this answer.)