Is it possible to have an actual memory leak in Py

2019-01-30 18:07发布

I don't have a code example, but I'm curious whether it's possible to write Python code that results in essentially a memory leak.

6条回答
我命由我不由天
2楼-- · 2019-01-30 18:16

It is possible, yes.

It depends on what kind of memory leak you are talking about. Within pure python code, it's not possible to "forget to free" memory such as in C, but it is possible to leave a reference hanging somewhere. Some examples of such:

an unhandled traceback object that is keeping an entire stack frame alive, even though the function is no longer running

while game.running():
    try:
        key_press = handle_input()
    except SomeException:
        etype, evalue, tb = sys.exc_info()
        # Do something with tb like inspecting or printing the traceback

In this silly example of a game loop maybe, we assigned 'tb' to a local. We had good intentions, but this tb contains frame information about the stack of whatever was happening in our handle_input all the way down to what this called. Presuming your game continues, this 'tb' is kept alive even in your next call to handle_input, and maybe forever. The docs for exc_info now talk about this potential circular reference issue and recommend simply not assigning tb if you don't absolutely need it. If you need to get a traceback consider e.g. traceback.format_exc

storing values in a class or global scope instead of instance scope, and not realizing it.

This one can happen in insidious ways, but often happens when you define mutable types in your class scope.

class Money(object):
    name = ''
    symbols = []   # This is the dangerous line here

    def set_name(self, name):
        self.name = name

    def add_symbol(self, symbol):
        self.symbols.append(symbol)

In the above example, say you did

m = Money()
m.set_name('Dollar')
m.add_symbol('$')

You'll probably find this particular bug quickly, but in this case you put a mutable value at class scope and even though you correctly access it at instance scope, it's actually "falling through" to the class object's __dict__.

This used in certain contexts like holding objects could potentially cause things that cause your application's heap to grow forever, and would cause issues in say, a production web application that didn't restart its processes occasionally.

Cyclic references in classes which also have a __del__ method.

Ironically, the existence of a __del__ makes it impossible for the cyclic garbage collector to clean an instance up. Say you had something where you wanted to do a destructor for finalization purposes:

class ClientConnection(...):
    def __del__(self):
        if self.socket is not None:
            self.socket.close()
            self.socket = None

Now this works fine on its own, and you may be led to believe it's being a good steward of OS resources to ensure the socket is 'disposed' of.

However, if ClientConnection kept a reference to say, User and User kept a reference to the connection, you might be tempted to say that on cleanup, let's have user de-reference the connection. This is actually the flaw, however: the cyclic GC doesn't know the correct order of operations and cannot clean it up.

The solution to this is to ensure you do cleanup on say, disconnect events by calling some sort of close, but name that method something other than __del__.

poorly implemented C extensions, or not properly using C libraries as they are supposed to be.

In Python, you trust in the garbage collector to throw away things you aren't using. But if you use a C extension that wraps a C library, the majority of the time you are responsible for making sure you explicitly close or de-allocate resources. Mostly this is documented, but a python programmer who is used to not having to do this explicit de-allocation might throw away the handle (like returning from a function or whatever) to that library without knowing that resources are being held.

Scopes which contain closures which contain a whole lot more than you could've anticipated

class User:
    def set_profile(self, profile):
        def on_completed(result):
            if result.success:
                self.profile = profile

        self._db.execute(
            change={'profile': profile},
            on_complete=on_completed
        )

In this contrived example, we appear to be using some sort of 'async' call that will call us back at on_completed when the DB call is done (the implementation could've been promises, it ends up with the same outcome).

What you may not realize is that the on_completed closure binds a reference to self in order to execute the self.profile assignment. Now, perhaps the DB client keeps track of active queries and pointers to the closures to call when they're done (since it's async) and say it crashes for whatever reason. If the DB client doesn't correctly cleanup callbacks etc, in this case, the DB client now has a reference to on_completed which has a reference to User which keeps a _db - you've now created a circular reference that may never get collected.

(Even without a circular reference, the fact that closures bind locals and even instances sometimes may cause values you thought were collected to be living for a long time, which could include sockets, clients, large buffers, and entire trees of things)

Default parameters which are mutable types

def foo(a=[]):
    a.append(time.time())
    return a

This is a contrived example, but one could be led to believe that the default value of a being an empty list means append to it, when it is in fact a reference to the same list. This again might cause unbounded growth without knowing that you did that.

查看更多
Juvenile、少年°
3楼-- · 2019-01-30 18:21

The classic definition of a memory leak is memory that was used once, and now is not, but has not been reclaimed. That nearly impossible with pure Python code. But as Antoine points out, you can easily have the effect of consuming all your memory inadvertently by allowing data structures to grow without bound, even if you don't need to keep all of the data around.

With C extensions, of course, you are back in unmanaged territory, and anything is possible.

查看更多
地球回转人心会变
4楼-- · 2019-01-30 18:29

Of course you can. The typical example of a memory leak is if you build a cache that you never flush manually and that has no automatic eviction policy.

查看更多
倾城 Initia
5楼-- · 2019-01-30 18:29

In the sense of orphaning allocated objects after they go out of scope because you forgot to deallocate them, no; Python will automatically deallocate out of scope objects (Garbage Collection). But in the sense that @Antione is talking about, yes.

查看更多
女痞
6楼-- · 2019-01-30 18:33

Since many modules are written in C , yes, it is possible to have memory leaks. imagine you are using a gui paint drawing context (eg with wxpython) , you can create memory buffers but if you forgot to release it. you will have memory leaks... in this case, C++ functions of wx api are wrapped to python.

a bigger wrong usage , imagine you overload these wx widgets methods within python... memoryleaks assured.

查看更多
祖国的老花朵
7楼-- · 2019-01-30 18:40

I create an object with a heavy attribute to show off in the process memory usage.

Then I create a dictionary which refers itself for a big number of times.

Then I delete the object, and ask GC to collect garrbage. It collects none.

Then I check the process RAM footprint - it is the same.

Here you go, memory leak!

α python
Python 2.7.15 (default, Oct  2 2018, 11:47:18)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gc
>>> class B(object):
...     b = list(range(1 * 10 ** 8))
...
>>>
[1]+  Stopped                 python
~/Sources/plan9port [git branch:master]
α ps aux | grep python
alexander.pugachev 85164   0.0 19.0  7562952 3188184 s010  T     2:08pm   0:03.78 /usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
~/Sources/plan9port [git branch:master]
α fg
python

>>> b = B()
>>> for i in range(1000):
...     b.a = {'b': b}
...
>>>
[1]+  Stopped                 python
~/Sources/plan9port [git branch:master]
α ps aux | grep python
alexander.pugachev 85164   0.0 19.0  7579336 3188264 s010  T     2:08pm   0:03.79 /usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
~/Sources/plan9port [git branch:master]
α fg
python


>>> b.a['b'].a
{'b': <__main__.B object at 0x109204950>}
>>> del(b)
>>> gc.collect()
0
>>>
[1]+  Stopped                 python
~/Sources/plan9port [git branch:master]
α ps aux | grep python
alexander.pugachev 85164   0.0 19.0  7579336 3188268 s010  T     2:08pm   0:05.13 /usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
查看更多
登录 后发表回答