Possible Duplicate:
How can I speed up fetching pages with urllib2 in python?
I have a python script that download web page, parse it and return some value from the page. I need to scrape a few such pages for getting the final result. Every page retrieve takes long time (5-10s) and I'd prefer to make requests in parallel to decrease wait time.
The question is - which mechanism will do it quick, correctly and with minimal CPU/Memory waste? Twisted, asyncore, threading, something else? Could you provide some link with examples?
Thanks
UPD: There's a few solutions for the problem, I'm looking for the compromise between speed and resources. If you could tell some experience details - how it's fast under load from your view, etc - it would be very helpful.
multiprocessing.Pool can be a good deal, there are some useful examples. For example if you have a list of urls, you can map the contents retrieval in a concurrent way:
multiprocessing
Spawn a bunch of processes, one for each URL you want to download. Use a
Queue
to hold a list of URLs, and make the processes each read a URL off the queue, process it, and return a value.Use an asynchronous, i.e. event-driven rather than blocking, networking framework for this. One option is to use twisted. Another option that has recently become available is to use monocle. This mini-framework hides the complexities of non-blocking operations. See this example. It can use twisted or tornado behind the scenes, but you don't really notice much of it.