Why does assigning past the end of a list via a sl

2019-03-15 00:54发布

This question already has an answer here:

I'm working on a sparse list implementation and recently implemented assignment via a slice. This led me to discover some behaviour in Python's built-in list implementation that I find suprising.

Given an empty list and an assignment via a slice:

>>> l = []
>>> l[100:] = ['foo']

I would have expected an IndexError from list here because the way this is implemented means that an item can't be retrieved from the specified index::

>>> l[100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

'foo' cannot even be retrieved from the specified slice:

>>> l = []
>>> l[100:] = ['foo']
>>> l[100:]
[]

l[100:] = ['foo'] appends to the list (that is, l == ['foo'] after this assignment) and appears to have behaved this way since the BDFL's initial version. I can't find this functionality documented anywhere (*) but both CPython and PyPy behave this way.

Assigning by index raises an error:

>>> l[100] = 'bar'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

So why does assigning past the end of a list via a slice not raise an IndexError (or some other error, I guess)?


To clarify following the first two comments, this question is specifically about assignment, not retrieval (cf. Why substring slicing index out of range works in Python?).

Giving into the temptation to guess and assigning 'foo' to l at index 0 when I had explicitly specified index 100 doesn't follow the usual Zen of Python.

Consider the case where the assignment happens far away from the initialisation and the index is a variable. The caller can no longer retrieve their data from the specified location.

Assigning to a slice before the end of a list behaves somewhat differently to the example above:

>>> l = [None, None, None, None]
>>> l[3:] = ['bar']
>>> l[3:]
['bar']

(*) This behaviour is defined in Note 4 of 5.6. Sequence Types in the official documentation (thanks elethan) but it's not explained why it would be considered desirable on assignment.


Note: I understand how retrieval works and can see how it may be desirable to be consistent with this for assignment but am looking for a cited reason as to why assigning to a slice would behave in this way. l[100:] returning [] immediately after l[100:] = ['foo'] but l[3:] returning ['bar'] after l[3:] = ['bar'] is astonishing if you have no knowledge of len(l), particularly if you're following Python's EAFP idiom.

2条回答
成全新的幸福
2楼-- · 2019-03-15 01:10

For indexing, an error must be raised if the given index is out-of-bounds, because there is no acceptable default value that could be returned. (It is not acceptable to return None, because None could be a valid element of the sequence).

By contrast, for slicing, raising an error is not necessary if any of the indexes are out-of-bounds, because it is acceptable to return an empty sequence as a default value. And it also desirable to do this, because it provides a consistent way refer to subsequences both between elements and beyond the ends of the sequence (thus allowing for insertions).

As stated in the Sequence Types Notes, if the start or end value of a slice is greater than len(seq), then len(seq) is used instead.

So given a = [4, 5, 6], the expressions a[3:] and a[100:] both point to the empty subsequence following the last element in the list. However, after a slice assignment using these expressions, they may no longer refer to the same thing, since the length of the list may have been changed.

Thus, after the asignment a[3:] = [7], the slice a[3:] will return [7]. But after the asignment a[100:] = [8], the slice a[100:] will still return [], because len(a) is still less than 100. And given everything else stated above, this is exactly what one should expect if consistency between slice assignment and slice retrieval is to be maintained.

查看更多
男人必须洒脱
3楼-- · 2019-03-15 01:21

Let's see what is actually happening:

>>> l = []
>>> l[100:] = ['foo']
>>> l[100:]
[]
>>> l
['foo']

So the assignment was actually successful, and the item got placed into the list, as the first item.

Why this happens is because 100: in indexing position is converted to a slice object: slice(100, None, None):

>>> class Foo:
...     def __getitem__(self, i):
...         return i
... 
>>> Foo()[100:]
slice(100, None, None)

Now, the slice class has a method indices (I am not able to find its Python documentation online, though) that, when given a length of a sequence, will give (start, stop, stride) that is adjusted for the length of that sequence.

>>> slice(100, None, None).indices(0)
(0, 0, 1)

Thus when this slice is applied to a sequence of length 0, it behaves exactly like a slice slice(0, 0, 1) for slice retrievals, e.g. instead of foo[100:] throwing an error when foo is an empty sequence, it behaves as if foo[0:0:1] was requested - this will result on empty slice on retrieval.

Now the setter code should work correctly when l[100:] was used when l is a sequence that has more than 100 elements. To make it work there, the easiest is to not reinvent the wheel, and to just use the indices mechanism above. As a downside, it will now look a bit peculiar in edge cases, but slice assignments to slices that are "out of bounds" will be placed at the end of the current sequence instead. (However, it turns out that there is little code reuse in the CPython code; list_ass_slice essentially duplicates all this index handling, even though it would also be available via slice object C-API).

Thus: if start index of a slice is greater than or equal to the length of a sequence, the resulting slice behaves as if it is a zero-width slice starting from the end of the the sequence. I.e.: if a >= len(l), l[a:] behaves like l[len(l):len(l)] on built-in types. This is true for each of assignment, retrieval and deletion.

The desirability of this is in that it doesn't need any exceptions. The slice.indices method doesn't need to handle any exceptions - for a sequence of length l, slice.indices(l) will always result in (start, end, stride) of indices that can be used for any of assignment, retrieval and deletion, and it is guaranteed that both start and end are 0 <= v <= len(l).

查看更多
登录 后发表回答