I have some code that uses mechanize and beautifulsoup for web scraping some data. The code works fine on a test machine but the production machine is blocking the connection. The error i get is:
urlopen error [Errno 10053] An established connection was aborted by the software in your host machine
I have read through similar posts and I cannot find this exact error. The site I am trying to scrape is HTTPS but I have also had the same error occur with an HTTP site. I am using python 2.6 and mechanize 0.2.4.
Is this due to the proxy or, as the error says, something on my local machine??
I've written in for mechanize to use the system's proxy:
br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1')]
br.set_proxies({}) #will use system default proxy
page = br.open(url)
html = page.read()
soup = BeautifulSoup.BeautifulSoup(html)
Again, this all works on my test machine, but the production machine gives that Error 10053.
The issue here was a host based IDS was preventing the connection out. Problem solved.
I added my python script to the HIDS exception list. The exception list was the list of files that I allowed to connect out to the internet. Once it was added to the list, I was able to get network connectivity with the script and had no further problems. The test machine did not have a HIDS client installed so that is why it was allowing me to talk out. FYI, both had firewalls but only one (production machine) had the HIDS.
HIDS stands for Host based Intrusion Detection System. If the network security team has made the HIDS not visible to you, you might not know where to find it. Also, even if you do find it, you will not be able to disable it. You can ask your security team if they can add an exception for your script. Another sneaky way around the HIDS is to build your script into an exe (using Py2EXE) and rename the executable you create to something already on the HIDS exception list. A good one to rename it to would be your browser, so if Firefox is allowed internet access, rename your exe to firefox.exe.