I am trying to visit the webpages and check if the website owner allows to contact him or not..
Here is http://pastebin.com/12rLXQaz
This is the function that each thread calls:
def getpage():
try:
curl = urls.pop(0)
print "working on " +str(curl)
thepage1 = requests.get(curl).text
global ctot
if "Contact Us" in thepage1:
slist.write("\n" +curl)
ctot = ctot + 1
except:
pass
finally:
if len(urls)>0 :
getpage()
But the thing is memory of program keep on getting increased.. (pythonw.exe)
As the thread calling the function again the condition is true .. the memory of the program should stay at least approximately at the same level.
For a list containing about 100k URLs, the program is taking much more than 3GB and increasing...
I had a look at your code: http://pastebin.com/J4Rd3NhA
I would use join while 100 threads run:
How does this perform? If something is wrong, tell me.
Your program is recursive for no reason. The recursion means that for each page you get you create a new set of variables, and since these are still being referenced by the local variables in the function, since the function never ends, the garbage collection never comes into play, and it will continue to eat memory for ever.
Read up on the
while
statement, it's the one you want to use instead of recursion here.