python copy.deepcopy lists seems shallow

2019-02-20 16:34发布

I am trying to initialize a list of lists representing a 3x3 array:

import copy
m = copy.deepcopy(3*[3*[0]])
print(m)
m[1][2] = 100
print(m)

and the output is:

[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[0, 0, 100], [0, 0, 100], [0, 0, 100]]

which is not what I expected since the last elements of each row are shared! I did get the result I need by using:

m = [ copy.deepcopy(3*[0]) for i in range(3) ]

but I don't understand why the first (and simpler) form does not work. Isn't deepcopy supposed to be deep?

3条回答
萌系小妹纸
2楼-- · 2019-02-20 17:00

The thing is that you create a list of 3 times the same object, so when you assign a value in one of the lists, it affects all of them (because it is the same).

Try to do:

a = [[3*[0]] for i in range(3)]
m = copy.deepcopy(a)

Here you create "a", which is a list of 3 lists of size 3, initialized with 0's. Deep copying "a" will give you "m" - the same as "a", but different object, so that changing "a" will not affect "m" and vice versa.

查看更多
Anthone
3楼-- · 2019-02-20 17:06

The problem is that deepcopy keeps a memo that contains all instances that have been copied already. That's to avoid infinite recursions and intentional shared objects. So when it tries to deepcopy the second sublist it sees that it has already copied it (the first sublist) and just inserts the first sublist again. In short deepcopy doesn't solve the "shared sublist" problem!

To quote the documentation:

Two problems often exist with deep copy operations that don’t exist with shallow copy operations:

  • Recursive objects (compound objects that, directly or indirectly, contain a reference to themselves) may cause a recursive loop.
  • Because deep copy copies everything it may copy too much, such as data which is intended to be shared between copies.

The deepcopy() function avoids these problems by:

  • keeping a “memo” dictionary of objects already copied during the current copying pass; and
  • letting user-defined classes override the copying operation or the set of components copied.

(emphasis mine)

That means that deepcopy regards shared references as intention. For example consider the class:

from copy import deepcopy

class A(object):
    def __init__(self, x):
        self.x = x
        self.x1 = x[0]  # intentional sharing of the sublist with x attribute
        self.x2 = x[1]  # intentional sharing of the sublist with x attribute

a1 = A([[1, 2], [2, 3]])
a2 = deepcopy(a1)
a2.x1[0] = 10
print(a2.x)
# [[10, 2], [2, 3]]

Neglecting that the class doesn't make much sense as is it intentionally shares the references between its x and x1 and x2 attribute. It would be weird if deepcopy broke those shared references by doing a separate copy of each of these. That's why the documentation mentions this as a "solution" to the problem of "copy too much, such as data which is intended to be shared between copies.".

Back to your example: If you don't want to have shared references it would be better to avoid them completely:

m = [[0]*3 for _ in range(3)]

In your case the inner elements are immutable because 0 is immutable - but if you deal with mutable instances inside the innermost lists you must have to avoid the inner list multiplication as well:

m = [[0 for _ in range(3)] for _ in range(3)] 
查看更多
ら.Afraid
4楼-- · 2019-02-20 17:18

After I read several answers I thought a bit more about this question. The problem lies in the fact that there is no way that a recursive (circular) object can be deeply copied! For instance:

x = [1]
x.append(x)

produces an object which behaves like an infinite sequence of 1's which Python prints as:

[1, [...]]

The same happens to deepcopy(x). In my opinion Python implementation adopted a solution which avoids infinite loops but may produce incorrect results for objects without circularities but with shared components. I'd rather see my program loop forever and fix it than have to search for an obscure bug!

查看更多
登录 后发表回答