Memory leaks when image discarded in Python

2019-05-28 22:14发布

问题:

I'm currently writing a simple board game in Python and I just realized that garbage collection doesn't purge the discarded bitmap data from memory when images are reloaded. It happens only when game is started or loaded or the resolution changes but it multiples the memory consumed so I can't let this problem unsolved.

When images are reloaded all references are transferred to the new image data since it is binded to the same variable as the original image data was binded to. I tried to force the garbage collection by using collect() but it didn't help.

I wrote a small sample to demonstrate my problem.

from tkinter import Button, DISABLED, Frame, Label, NORMAL, Tk
from PIL.Image import open
from PIL.ImageTk import PhotoImage

class App(Tk):
    def __init__(self):
        Tk.__init__(self)
        self.text = Label(self, text = "Please check the memory usage. Then push button #1.")
        self.text.pack()
        self.btn = Button(text = "#1", command = lambda : self.buttonPushed(1))
        self.btn.pack()

    def buttonPushed(self, n):
        "Cycle to open the Tab module n times."
        self.btn.configure(state = DISABLED) # disable to prevent paralell cycles
        if n == 100:
            self.text.configure(text = "Overwriting the bitmap with itself 100 times...\n\nCheck the memory usage!\n\nUI may seem to hang but it will finish soon.")
            self.update_idletasks()
        for i in range(n):      # creates the Tab frame whith the img, destroys it, then recreates them to overwrite the previous Frame and prevous img
            b = Tab(self)
            b.destroy()
            if n == 100:
                print(i+1,"percent of processing finished.")
        if n == 1:
            self.text.configure(text = "Please check the memory usage now.\nMost of the difference is caused by the bitmap opened.\nNow push button #100.")
            self.btn.configure(text = "#100", command = lambda : self.buttonPushed(100))
        self.btn.configure(state = NORMAL)  # starting cycles is enabled again       

class Tab(Frame):
    """Creates a frame with a picture in it."""
    def __init__(self, master):
        Frame.__init__(self, master = master)
        self.a = PhotoImage(open("map.png"))    # img opened, change this to a valid one to test it
        self.b = Label(self, image = self.a)
        self.b.pack()                           # Label with img appears in Frame
        self.pack()                             # Frame appears

if __name__ == '__main__':
    a = App()

To run the code above you will need a PNG image file. My map.png's dimensions are 1062×1062. As a PNG it is 1.51 MB and as bitmap data it is about 3-3.5 MB. Use a large image to see the memory leak easily.

Expected result when you run my code: python's process eats up memory cycle by cycle. When it consumes approximately 500 MB it collapses but starts to eat up the memory again.

Please give me some advice how to solve this issue. I'm grateful for every help. Thank you. in advance.

回答1:

First, you definitely do not have a memory leak. If it "collapses" whenever it gets near 500MB and never crosses it, it can't possibly be leaking.


And my guess is that you don't have any problem at all.

When Python's garbage collector cleans things up (which generally happens immediately when you're done with it in CPython), it generally doesn't actually release the memory to the OS. Instead, it keeps it around in case you need it later. This is intentional—unless you're thrashing swap, it's a whole lot faster to reuse memory than to keep freeing and reallocating it.

Also, if 500MB is virtual memory, that's nothing on a modern 64-bit platform. If it's not mapped to physical/resident memory (or is mapped if the computer is idle, but quickly tossed otherwise), it's not a problem; it's just the OS being nice with resources that are effectively free.

More importantly: What makes you think there's a problem? Is there any actual symptom, or just something in Program Manager/Activity Monitor/top/whatever that scares you? (If the latter, take a look at the of the other programs. On my Mac, I've got 28 programs currently running using over 400MB of virtual memory, and I'm using 11 out of 16GB, even though less than 3GB is actually wired. If I, say, fire up Logic, the memory will be collected faster than Logic can use it; until then, why should the OS waste effort unmapping memory (especially when it has no way to be sure some processes won't go ask for that memory it wasn't using later)?


But if there is a real problem, there are two ways to solve it.


The first trick is to do everything memory-intensive in a child process that you can kill and restart to recover the temporary memory (e.g., by using multiprocessing.Process or concurrent.futures.ProcessPoolExecutor).

This usually makes things slower rather than faster. And it's obviously not easy to do when the temporary memory is mostly things that go right into the GUI, and therefore have to live in the main process.


The other option is to figure out where the memory's being used and not keep so many objects around at the same time. Basically, there are two parts to this:

First, release everything possible before the end of each event handler. This means calling close on files, either deling objects or setting all references to them to None, calling destroy on GUI objects that aren't visible, and, most of all, not storing references to things you don't need. (Do you actually need to keep the PhotoImage around after you use it? If you do, is there any way you can load the images on demand?)

Next, make sure you have no reference cycles. In CPython, garbage is cleaned up immediately as long as there are no cycles—but if there are, they sit around until the cycle checker runs. You can use the gc module to investigate this. One really quick thing to do is try this every so often:

print(gc.get_count())
gc.collect()
print(gc.get_count())

If you see huge drops, you've got cycles. You'll have to look inside gc.getobjects() and gc.garbage, or attach callbacks, or just reason about your code to find exactly where the cycles are. For each one, if you don't really need references in both directions, get rid of one; if you do, change one of them into a weakref.