I am opening a file which has 100,000 url's. I need to send an http request to each url and print the status code. I am using Python 2.6, and so far looked at the many confusing ways Python implements threading/concurrency. I have even looked at the python concurrence library, but cannot figure out how to write this program correctly. Has anyone come across a similar problem? I guess generally I need to know how to perform thousands of tasks in Python as fast as possible - I suppose that means 'concurrently'.
相关问题
- Angular RxJS mergeMap types
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
Use grequests , it's a combination of requests + Gevent module .
GRequests allows you to use Requests with Gevent to make asyncronous HTTP Requests easily.
Usage is simple:
Create a set of unsent Requests:
Send them all at the same time:
This twisted async web client goes pretty fast.
Threads are absolutely not the answer here. They will provide both process and kernel bottlenecks, as well as throughput limits that are not acceptable if the overall goal is "the fastest way".
A little bit of
twisted
and its asynchronousHTTP
client would give you much better results.The easiest way would be to use Python's built-in threading library.
They're not "real" / kernel threadsThey have issues (like serialization), but are good enough. You'd want a queue & thread pool. One option is here, but it's trivial to write your own. You can't parallelize all 100,000 calls, but you can fire off 100 (or so) of them at the same time.Consider using Windmill , although Windmill probably cant do that many threads.
You could do it with a hand rolled Python script on 5 machines, each one connecting outbound using ports 40000-60000, opening 100,000 port connections.
Also, it might help to do a sample test with a nicely threaded QA app such as OpenSTA in order to get an idea of how much each server can handle.
Also, try looking into just using simple Perl with the LWP::ConnCache class. You'll probably get more performance (more connections) that way.
A solution:
Testtime:
Pingtime: