wget Vs urlretrieve of python-第2页回答

I have a task to download Gbs of data from a website. The data is in form of .gz files, each file being 45mb in size.

The easy way to get the files is use "wget -r -np -A files url". This will donwload data in a recursive format and mirrors the website. The donwload rate is very high 4mb/sec.

But, just to play around I was also using python to build my urlparser.

Downloading via Python's urlretrieve is damm slow, possible 4 times as slow as wget. The download rate is 500kb/sec. I use HTMLParser for parsing the href tags.

I am not sure why is this happening. Are there any settings for this.

Thanks

标签： python urllib2 wget

10条回答

叛逆

2楼-- · 2019-02-02 01:33

You can use wget -k to engage relative links in all urls.

0人赞添加讨论(0) 举报

姐就是有狂的资本

3楼-- · 2019-02-02 01:34

Please show us some code. I'm pretty sure that it has to be with the code and not on urlretrieve.

I've worked with it in the past and never had any speed related issues.

0人赞添加讨论(0) 举报

ゆ、 Hurt°

4楼-- · 2019-02-02 01:37

Transfer speeds can be easily misleading.. Could you try with the following script, which simply downloads the same URL with both wget and urllib.urlretrieve - run it a few times incase you're behind a proxy which caches the URL on the second attempt.

For small files, wget will take slightly longer due to the external process' startup time, but for larger files that should be come irrelevant.

from time import time
import urllib
import subprocess

target = "http://example.com" # change this to a more useful URL

wget_start = time()

proc = subprocess.Popen(["wget", target])
proc.communicate()

wget_end = time()


url_start = time()
urllib.urlretrieve(target)
url_end = time()

print "wget -> %s" % (wget_end - wget_start)
print "urllib.urlretrieve -> %s"  % (url_end - url_start)

0人赞添加讨论(0) 举报

Root（大扎）

5楼-- · 2019-02-02 01:42

Since python suggests using urllib2 instead of urllib, I take a test between urllib2.urlopen and wget.

The result is, it takes nearly the same time for both of them to download the same file.Sometimes, urllib2 performs even better.

The advantage of wget lies in a dynamic progress bar to show the percent finished and the current download speed when transferring.

The file size in my test is 5MB.I haven't used any cache module in python and I am not aware of how wget works when downloading big size file.

0人赞添加讨论(0) 举报

上一页 1 2

wget Vs urlretrieve of python

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间