It is my understanding that the range()
function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.
This being the case, I would have expected the following line to take an inordinate amount of time, because in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:
1000000000000000 in range(1000000000000001)
Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).
I have also tried things like this, but the calculation is still almost instant:
1000000000000000000000 in range(0,1000000000000000000001,10) # count by tens
If I try to implement my own range function, the result is not so nice!!
def my_crappy_range(N):
i = 0
while i < N:
yield i
i += 1
return
What is the range()
object doing under the hood that makes it so fast?
Martijn Pieters' answer was chosen for its completeness, but also see abarnert's first answer for a good discussion of what it means for range
to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__
function optimization across Python implementations. abarnert's other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange
in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.
To add to Martijn’s answer, this is the relevant part of the source (in C, as the range object is written in native code):
So for
PyLong
objects (which isint
in Python 3), it will use therange_contains_long
function to determine the result. And that function essentially checks ifob
is in the specified range (although it looks a bit more complex in C).If it’s not an
int
object, it falls back to iterating until it finds the value (or not).The whole logic could be translated to pseudo-Python like this:
If you're wondering why this optimization was added to
range.__contains__
, and why it wasn't added toxrange.__contains__
in 2.7:First, as Ashwini Chaudhary discovered, issue 1766304 was opened explicitly to optimize
[x]range.__contains__
. A patch for this was accepted and checked in for 3.2, but not backported to 2.7 because "xrange has behaved like this for such a long time that I don't see what it buys us to commit the patch this late." (2.7 was nearly out at that point.)Meanwhile:
Originally,
xrange
was a not-quite-sequence object. As the 3.1 docs say:This wasn't quite true; an
xrange
object actually supported a few other things that come automatically with indexing andlen
,* including__contains__
(via linear search). But nobody thought it was worth making them full sequences at the time.Then, as part of implementing the Abstract Base Classes PEP, it was important to figure out which builtin types should be marked as implementing which ABCs, and
xrange
/range
claimed to implementcollections.Sequence
, even though it still only handled the same "very little behavior". Nobody noticed that problem until issue 9213. The patch for that issue not only addedindex
andcount
to 3.2'srange
, it also re-worked the optimized__contains__
(which shares the same math withindex
, and is directly used bycount
).** This change went in for 3.2 as well, and was not backported to 2.x, because "it's a bugfix that adds new methods". (At this point, 2.7 was already past rc status.)So, there were two chances to get this optimization backported to 2.7, but they were both rejected.
* In fact, you even get iteration for free with
len
and indexing, but in 2.3xrange
objects got a custom iterator. Which they then lost in 3.x, which uses the samelistiterator
type aslist
.** The first version actually reimplemented it, and got the details wrong—e.g., it would give you
MyIntSubclass(2) in range(5) == False
. But Daniel Stutzbach's updated version of the patch restored most of the previous code, including the fallback to the generic, slow_PySequence_IterSearch
that pre-3.2range.__contains__
was implicitly using when the optimization doesn't apply.The other answers explained it well already, but I'd like to offer another experiment illustrating the nature of range objects:
As you can see, a range object is an object that remembers its range and can be used many times (even while iterating over it), not just a one-time generator.
The fundamental misunderstanding here is in thinking that
range
is a generator. It's not. In fact, it's not any kind of iterator.You can tell this pretty easily:
If it were a generator, iterating it once would exhaust it:
What
range
actually is, is a sequence, just like a list. You can even test this:This means it has to follow all the rules of being a sequence:
The difference between a
range
and alist
is that arange
is a lazy or dynamic sequence; it doesn't remember all of its values, it just remembers itsstart
,stop
, andstep
, and creates the values on demand on__getitem__
.(As a side note, if you
print(iter(a))
, you'll notice thatrange
uses the samelistiterator
type aslist
. How does that work? Alistiterator
doesn't use anything special aboutlist
except for the fact that it provides a C implementation of__getitem__
, so it works fine forrange
too.)Now, there's nothing that says that
Sequence.__contains__
has to be constant time—in fact, for obvious examples of sequences likelist
, it isn't. But there's nothing that says it can't be. And it's easier to implementrange.__contains__
to just check it mathematically ((val - start) % step
, but with some extra complexity to deal with negative steps) than to actually generate and test all the values, so why shouldn't it do it the better way?But there doesn't seem to be anything in the language that guarantees this will happen. As Ashwini Chaudhari points out, if you give it a non-integral value, instead of converting to integer and doing the mathematical test, it will fall back to iterating all the values and comparing them one by one. And just because CPython 3.2+ and PyPy 3.x versions happen to contain this optimization, and it's an obvious good idea and easy to do, there's no reason that IronPython or NewKickAssPython 3.x couldn't leave it out. (And in fact CPython 3.0-3.1 didn't include it.)
If
range
actually were a generator, likemy_crappy_range
, then it wouldn't make sense to test__contains__
this way, or at least the way it makes sense wouldn't be obvious. If you'd already iterated the first 3 values, is1
stillin
the generator? Should testing for1
cause it to iterate and consume all the values up to1
(or up to the first value>= 1
)?It's all about lazy approach to the evaluation, and some extra optimalization of
range
. Values in ranges don't need to be computed until real use, or even futher due to extra optimalization.By the way your integer is not such big, consider
sys.maxsize
sys.maxsize in range(sys.maxsize)
is pretty fastdue to optimization - it's easy to compare given integer just with min and max of range.
but:
float(sys.maxsize) in range(sys.maxsize)
is pretty slow.(in this case there is no optimization in
range
, so since receive unexpected float, python will compare all numbers)Here is similar implementation in
C#
. You can see howContains
done in O(1) time.