Retrieve multiple urls at once/in parallel [duplic

Possible Duplicate:
How can I speed up fetching pages with urllib2 in python?

I have a python script that download web page, parse it and return some value from the page. I need to scrape a few such pages for getting the final result. Every page retrieve takes long time (5-10s) and I'd prefer to make requests in parallel to decrease wait time.
The question is - which mechanism will do it quick, correctly and with minimal CPU/Memory waste? Twisted, asyncore, threading, something else? Could you provide some link with examples?
Thanks

UPD: There's a few solutions for the problem, I'm looking for the compromise between speed and resources. If you could tell some experience details - how it's fast under load from your view, etc - it would be very helpful.

标签： python parallel-processing screen-scraping

3条回答

家丑人穷心不美

2楼-- · 2019-03-09 14:01

multiprocessing.Pool can be a good deal, there are some useful examples. For example if you have a list of urls, you can map the contents retrieval in a concurrent way:

def process_url(url):
    # Do what you want
    return what_you_want

pool = multiprocessing.Pool(processes=4) # how much parallelism?
pool.map(process_url, list_of_urls)

0人赞添加讨论(0) 举报

男人必须洒脱

3楼-- · 2019-03-09 14:18

multiprocessing

Spawn a bunch of processes, one for each URL you want to download. Use a Queue to hold a list of URLs, and make the processes each read a URL off the queue, process it, and return a value.

0人赞添加讨论(0) 举报

Ridiculous、

4楼-- · 2019-03-09 14:21

Use an asynchronous, i.e. event-driven rather than blocking, networking framework for this. One option is to use twisted. Another option that has recently become available is to use monocle. This mini-framework hides the complexities of non-blocking operations. See this example. It can use twisted or tornado behind the scenes, but you don't really notice much of it.

0人赞添加讨论(0) 举报

Retrieve multiple urls at once/in parallel [duplic

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间