I'm investigating a problem with a Python app running on an Ubuntu machine with 4G of RAM. The tool will be used to audit servers (we prefer to roll our own tools). It uses threads to connect to lots of servers and many of the TCP connections fail. However, if I add a delay of 1 second between kicking off each thread then most connections succeed. I have used this simple script to investigate what may be happening:
#!/usr/bin/python
import sys
import socket
import threading
import time
class Scanner(threading.Thread):
def __init__(self, host, port):
threading.Thread.__init__(self)
self.host = host
self.port = port
self.status = ""
def run(self):
self.sk = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.sk.settimeout(20)
try:
self.sk.connect((self.host, self.port))
except Exception, err:
self.status = str(err)
else:
self.status = "connected"
finally:
self.sk.close()
def get_hostnames_list(filename):
return open(filename).read().splitlines()
if (__name__ == "__main__"):
hostnames_file = sys.argv[1]
hosts_list = get_hostnames_list(hostnames_file)
threads = []
for host in hosts_list:
#time.sleep(1)
thread = Scanner(host, 443)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print "Host: ", thread.host, " : ", thread.status
If I run this with the time.sleep(1) commented out against, say, 300 hosts many of the connections fail with a timeout error, whereas they don't timeout if I put the delay of one second in. I did try the app on another Linux distro running on a more powerful machine and there weren't as many connect errors? Is it due to a kernel limitation? Is there anything I can do to get the connection to work without putting in the delay?
UPDATE
I have also tried a program that limited the number of threads available in a pool. By reducing this down to 20 I can get all connects to work, but it only checks about 1 host a second. So whatever I try (putting in a sleep(1) or limiting the number of concurrent threads) I don't seem to able to check more than 1 host every second.
UPDATE
I just found this question which seems similar to what I am seeing.
UPDATE
I wonder if writing this using twisted might help? Could anyone show what my example would look like written using twisted?
This variant is much faster than the code that uses
gevent
:Here's a variant that uses
t.i.d.inlineCallbacks
. It requires Python 2.5 or newer. It allows to write the asynchronous code in a synchronous (blocking) manner:How about a real threadpool?
Example:
It's in python 3, but shouldn't be too hard to convert to 2.x. I am not surprised if this fixes your problem.
First of all, try using nonblocking sockets. Another reason would be that you are consuming all of the ephemeral ports. Try removing the limit on that.
Python 3.4 introduces new provisional API for asynchronous IO --
asyncio
module.This approach is similar to
twisted
-based answer:As well as
twisted
variant it usesNoopProtocol
that does nothing but disconnects immediately on successful connection.Number of concurrent connections is limited using a semaphore.
The code is coroutine-based.
Example
To find out how many successful ssl connections we can make to the first 1000 hosts from top million Alexa list:
The result is less than half of all connections are successful. On average, it checks ~20 hosts per second. Many sites timed out after a minute. If host doesn't match hostnames from server's certificate then the connection also fails. It includes
example.com
vs.www.example.com
-like comparisons.You could try
gevent
:It can process more than one host per second.
Output