Persistence of urllib.request connections to a HTT

2019-01-15 10:25发布

问题:

I want to do some performance testing on one of our web servers, to see how the server handles a lot of persistent connections. Unfortunately, I'm not terribly familiar with HTTP and web testing. Here's the Python code I've got for this so far:

import http.client
import argparse
import threading


def make_http_connection():
    conn = http.client.HTTPConnection(options.server, timeout=30)
    conn.connect()


if __name__ == '__main__':
    parser = argparse.ArgumentParser()

    parser.add_argument("num", type=int, help="Number of connections to make (integer)")
    parser.add_argument("server", type=str, help="Server and port to connect to. Do not prepend \'http://\' for this")

    options = parser.parse_args()

    for n in range(options.num):
        connThread = threading.Thread(target = make_http_connection, args = ())
        connThread.daemon = True
        connThread.start()

    while True:
        try:
            pass
        except KeyboardInterrupt:
            break

My main question is this: How do I keep these connections alive? I've set a long timeout, but that's a very crude method and I'm not even sure it affects the connection. Would simply requesting a byte or two every once in a while do it?

(Also, on an unrelated note, is there a better procedure for waiting for a keyboard interrupt than the ugly while True: block at the end of my code?)

回答1:

urllib.request doesn't support persistent connections. There is 'Connection: close' hardcoded in the code. But http.client partially supports persistent connections (including legacy http/1.0 keep-alive). So the question title might be misleading.


I want to do some performance testing on one of our web servers, to see how the server handles a lot of persistent connections. Unfortunately, I'm not terribly familiar with HTTP and web testing.

You could use an existing http testing tools such as slowloris, httperf instead of writing one yourself.


How do I keep these connections alive?

To close http/1.1 connection a client should explicitly specify Connection: close header otherwise the connection is considered persistent by the server (though it may close it at any moment and http.client won't know about it until it tries to read/write to the connection).

conn.connect() returns almost immediately and your thread ends. To force each thread to maintain an http connection to the server you could:

import time

def make_http_connection(*args, **kwargs):
    while True: # make new http connections
        h = http.client.HTTPConnection(*args, **kwargs)
        while True: # make multiple requests using a single connection
            try:
                h.request('GET', '/') # send request; make conn. on the first run
                response = h.getresponse()
                while True: # read response slooowly
                    b = response.read(1) # read 1 byte
                    if not b:
                       break
                    time.sleep(60) # wait a minute before reading next byte
                    #note: the whole minute might pass before we notice that 
                    #  the server has closed the connection already
            except Exception:
                break # make new connection on any error

Note: if the server returns 'Connection: close' then there is a single request per connection.


(Also, on an unrelated note, is there a better procedure for waiting for a keyboard interrupt than the ugly while True: block at the end of my code?)

To wait until all threads finish or KeyboardInterrupt happens you could:

while threads:
    try:
        for t in threads[:]: # enumerate threads
            t.join(.1) # timeout 0.1 seconds
            if not t.is_alive():
               threads.remove(t)
    except KeyboardInterrupt:
        break

Or something like this:

while threading.active_count() > 1:
    try:
        main_thread = threading.current_thread()
        for t in threading.enumerate(): # enumerate all alive threads
            if t is not main_thread:
               t.join(.1)
    except KeyboardInterrupt:
        break

The later might not work for various reasons e.g., if there are dummy threads such as threads that started in C extensions without using threading module.

concurrent.futures.ThreadPoolExecutor provides a higher abstraction level than threading module and it can hide some complexity.

Instead of thread per connection model you could open multiple connections concurrently in a single thread e.g., using requests.async or gevent directly.



回答2:

If a lot is really a lot then you probably want use asynchronous io not threads.

requests + gevent = grequests

GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily.

import grequests

urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]

requests = (grequests.get(u) for u in urls)
responses = grequests.map(requests)

Requests support persistent HTTP connections.



回答3:

I'm going a bit outside my knowledge-base here, but I would assume that your thread finnishes when the function make_http_connection() completes. That is if you want them all you would want to include a:

while condition:
    pass

At the end of the function. I suppose you want them all to become active at the same time? Then let the function modify a global variable and use the condition to test this value against options.num so that the processes will wait until all of them are running before they start terminating.

Side-question, guessing what you're aiming at here, can't you just ask threading to count how many live threads you have and keep running until there's none left?

threading.active_count()

This here discusses reading keyboard, if that is what you need:

Polling the keyboard



回答4:

You really should be using a benchmark tool like Funkload to do that. If you don't have experience with HTTP, trying to do a performance test from the scratch like that will certainly lead to bad results.