Thread memory usage keeps increasing

2019-07-21 16:44发布

I am trying to visit the webpages and check if the website owner allows to contact him or not..

Here is http://pastebin.com/12rLXQaz

This is the function that each thread calls:

def getpage():
    try:
        curl = urls.pop(0)
        print "working on " +str(curl)
        thepage1 = requests.get(curl).text
        global ctot
        if "Contact Us" in thepage1:
            slist.write("\n" +curl)
            ctot = ctot + 1
    except:
        pass
    finally:
        if len(urls)>0 :
            getpage()  

But the thing is memory of program keep on getting increased.. (pythonw.exe)

As the thread calling the function again the condition is true .. the memory of the program should stay at least approximately at the same level.

For a list containing about 100k URLs, the program is taking much more than 3GB and increasing...

2条回答
SAY GOODBYE
2楼-- · 2019-07-21 17:05

I had a look at your code: http://pastebin.com/J4Rd3NhA

I would use join while 100 threads run:

for xd in range(0,noofthreads):
    t = threading.Thread(target=getpage)
    t.daemon = True
    t.start()
    tarray.append(t)
    # my additional code
    if len(tarray) >= 100:
        tarray[-100].join()

How does this perform? If something is wrong, tell me.

查看更多
▲ chillily
3楼-- · 2019-07-21 17:10

Your program is recursive for no reason. The recursion means that for each page you get you create a new set of variables, and since these are still being referenced by the local variables in the function, since the function never ends, the garbage collection never comes into play, and it will continue to eat memory for ever.

Read up on the while statement, it's the one you want to use instead of recursion here.

while len(urls)>0 :
    try:
        curl = urls.pop(0)
        thepage1 = requests.get(curl).text
        global ctot
        if "Contact Us" in thepage1:
            slist.write("\n" +curl)
            ctot = ctot + 1
    except:
        pass
查看更多
登录 后发表回答