I am trying to get the screenshot of the below URLs through the use of selenium but when I run this code it runs very very very slow.
The most amazing thing is it sometimes runs normal but most of the times it runs very slow. so I need a help.
I just print the screenshots and URL into the HTML file. So don't Confuse.
waybackurls401 = {}
waybackurls403 = {}
webarchive_urls403 = []
webarchive_urls403.append('https://web.archive.org/web/2012062112352/http://xx.com/')
webarchive_urls403.append('https://web.archive.org/web/2012062112352/http://xx2.com/')
print "\t[~]Findind of 403 staruscode urls\n"
GEckodriver = 'F:/geckodriver.exe'
firefox_options = Options()
firefox_options .add_argument("-headless")
driver = webdriver.Firefox(executable_path=GEckodriver, firefox_options = firefox_options )
for x in webarchive_urls403:
try:
print "\t", x
driver.get(x)
driver.set_page_load_timeout(6)
imgfilename = x.split('web')[-1]
newfile= imgfilename.replace('/', '.') +'.png'
driver.get_screenshot_as_file(newfile)
value = "<td><img src= file:///F:/master/{0} + width='20%' height= '25%'></td>".format(newfile, x)
key = "<tr><td width=\"50%\">{0}</td><td width=\"50%\"><img src= file:///F:/master/{1} width='30%' height= '20%'><br><a href=\"{2}\">URL</a></td></tr>".format(x, newfile, x)
waybackurls403[key] = value
except TimeoutException as ex:
print "Can't take screenshot because. Timeout."
driver.quit()
EDIT:-
According to the Kiril comment, I made some change to see where it actually stops.
for x in webarchive_urls403:
print time.time()-start
try:
print "\t", x
print 'test122'
driver.get(x)
print 'test1'
driver.set_page_load_timeout(10)
imgfilename = x.split('web')[-1]
newfile= imgfilename.replace('/', '.') +'.png'
driver.get_screenshot_as_file(newfile)
print 'test2'
value = "<td><img src= file:///F:/AutoRecon-master/{0} + width='20%' height= '25%'></td>".format(newfile, x)
key = "<tr><td width=\"50%\">{0}</td><td width=\"50%\"><img src= file:///F:/AutoRecon-master/{1} width='30%' height= '20%'><br><a href=\"{2}\">URL</a></td></tr>".format(x, newfile, x)
waybackurls403[key] = value
print 'test3'
except TimeoutException as ex:
print ex
driver.quit()
Now as you can see I provide some random prints for ex. print test122
to see where it actually stuck.
And I found that I can print test122
but not print test1
after the driver.get()
set it means the code is stuck after the driver.get()
Now that's the whole problem.