This question already has an answer here:
I'm working on a sparse list implementation and recently implemented assignment via a slice. This led me to discover some behaviour in Python's built-in list
implementation that I find suprising.
Given an empty list
and an assignment via a slice:
>>> l = []
>>> l[100:] = ['foo']
I would have expected an IndexError
from list
here because the way this is implemented means that an item can't be retrieved from the specified index::
>>> l[100]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
'foo'
cannot even be retrieved from the specified slice:
>>> l = []
>>> l[100:] = ['foo']
>>> l[100:]
[]
l[100:] = ['foo']
appends to the list
(that is, l == ['foo']
after this assignment) and appears to have behaved this way since the BDFL's initial version. I can't find this functionality documented anywhere (*) but both CPython and PyPy behave this way.
Assigning by index raises an error:
>>> l[100] = 'bar'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range
So why does assigning past the end of a list
via a slice not raise an IndexError
(or some other error, I guess)?
To clarify following the first two comments, this question is specifically about assignment, not retrieval (cf. Why substring slicing index out of range works in Python?).
Giving into the temptation to guess and assigning 'foo'
to l
at index 0 when I had explicitly specified index 100 doesn't follow the usual Zen of Python.
Consider the case where the assignment happens far away from the initialisation and the index is a variable. The caller can no longer retrieve their data from the specified location.
Assigning to a slice before the end of a list
behaves somewhat differently to the example above:
>>> l = [None, None, None, None]
>>> l[3:] = ['bar']
>>> l[3:]
['bar']
(*) This behaviour is defined in Note 4 of 5.6. Sequence Types in the official documentation (thanks elethan) but it's not explained why it would be considered desirable on assignment.
Note: I understand how retrieval works and can see how it may be desirable to be consistent with this for assignment but am looking for a cited reason as to why assigning to a slice would behave in this way. l[100:]
returning []
immediately after l[100:] = ['foo']
but l[3:]
returning ['bar']
after l[3:] = ['bar']
is astonishing if you have no knowledge of len(l)
, particularly if you're following Python's EAFP idiom.
For indexing, an error must be raised if the given index is out-of-bounds, because there is no acceptable default value that could be returned. (It is not acceptable to return
None
, becauseNone
could be a valid element of the sequence).By contrast, for slicing, raising an error is not necessary if any of the indexes are out-of-bounds, because it is acceptable to return an empty sequence as a default value. And it also desirable to do this, because it provides a consistent way refer to subsequences both between elements and beyond the ends of the sequence (thus allowing for insertions).
As stated in the Sequence Types Notes, if the start or end value of a slice is greater than
len(seq)
, thenlen(seq)
is used instead.So given
a = [4, 5, 6]
, the expressionsa[3:]
anda[100:]
both point to the empty subsequence following the last element in the list. However, after a slice assignment using these expressions, they may no longer refer to the same thing, since the length of the list may have been changed.Thus, after the asignment
a[3:] = [7]
, the slicea[3:]
will return[7]
. But after the asignmenta[100:] = [8]
, the slicea[100:]
will still return[]
, becauselen(a)
is still less than100
. And given everything else stated above, this is exactly what one should expect if consistency between slice assignment and slice retrieval is to be maintained.Let's see what is actually happening:
So the assignment was actually successful, and the item got placed into the list, as the first item.
Why this happens is because
100:
in indexing position is converted to aslice
object:slice(100, None, None)
:Now, the
slice
class has a methodindices
(I am not able to find its Python documentation online, though) that, when given a length of a sequence, will give(start, stop, stride)
that is adjusted for the length of that sequence.Thus when this slice is applied to a sequence of length 0, it behaves exactly like a slice
slice(0, 0, 1)
for slice retrievals, e.g. instead offoo[100:]
throwing an error whenfoo
is an empty sequence, it behaves as iffoo[0:0:1]
was requested - this will result on empty slice on retrieval.Now the setter code should work correctly when
l[100:]
was used when l is a sequence that has more than 100 elements. To make it work there, the easiest is to not reinvent the wheel, and to just use theindices
mechanism above. As a downside, it will now look a bit peculiar in edge cases, but slice assignments to slices that are "out of bounds" will be placed at the end of the current sequence instead. (However, it turns out that there is little code reuse in the CPython code;list_ass_slice
essentially duplicates all this index handling, even though it would also be available via slice object C-API).Thus: if start index of a slice is greater than or equal to the length of a sequence, the resulting slice behaves as if it is a zero-width slice starting from the end of the the sequence. I.e.: if
a >= len(l)
,l[a:]
behaves likel[len(l):len(l)]
on built-in types. This is true for each of assignment, retrieval and deletion.The desirability of this is in that it doesn't need any exceptions. The
slice.indices
method doesn't need to handle any exceptions - for a sequence of lengthl
,slice.indices(l)
will always result in(start, end, stride)
of indices that can be used for any of assignment, retrieval and deletion, and it is guaranteed that bothstart
andend
are0 <= v <= len(l)
.