How do I download a file over HTTP using Python?

2018-12-31 01:42发布

I have a small utility that I use to download a MP3 from a website on a schedule and then builds/updates a podcast XML file which I've obviously added to iTunes.

The text processing that creates/updates the XML file is written in Python. I use wget inside a Windows .bat file to download the actual MP3 however. I would prefer to have the entire utility written in Python though.

I struggled though to find a way to actually down load the file in Python, thus why I resorted to wget.

So, how do I download the file using Python?

22条回答
春风洒进眼中
2楼-- · 2018-12-31 02:07

In Python 2, use urllib2 which comes with the standard library.

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()

This is the most basic way to use the library, minus any error handling. You can also do more complex stuff such as changing headers. The documentation can be found here.

查看更多
无色无味的生活
3楼-- · 2018-12-31 02:11

If speed matters to you, I made a small performance test for the modules urllib and wget, and regarding wget I tried once with status bar and once without. I took three different 500MB files to test with (different files- to eliminate the chance that there is some caching going on under the hood). Tested on debian machine, with python2.

First, these are the results (they are similar in different runs):

$ python wget_test.py 
urlretrive_test : starting
urlretrive_test : 6.56
==============
wget_no_bar_test : starting
wget_no_bar_test : 7.20
==============
wget_with_bar_test : starting
100% [......................................................................] 541335552 / 541335552
wget_with_bar_test : 50.49
==============

The way I performed the test is using "profile" decorator. This is the full code:

import wget
import urllib
import time
from functools import wraps

def profile(func):
    @wraps(func)
    def inner(*args):
        print func.__name__, ": starting"
        start = time.time()
        ret = func(*args)
        end = time.time()
        print func.__name__, ": {:.2f}".format(end - start)
        return ret
    return inner

url1 = 'http://host.com/500a.iso'
url2 = 'http://host.com/500b.iso'
url3 = 'http://host.com/500c.iso'

def do_nothing(*args):
    pass

@profile
def urlretrive_test(url):
    return urllib.urlretrieve(url)

@profile
def wget_no_bar_test(url):
    return wget.download(url, out='/tmp/', bar=do_nothing)

@profile
def wget_with_bar_test(url):
    return wget.download(url, out='/tmp/')

urlretrive_test(url1)
print '=============='
time.sleep(1)

wget_no_bar_test(url2)
print '=============='
time.sleep(1)

wget_with_bar_test(url3)
print '=============='
time.sleep(1)

urllib seems to be the fastest

查看更多
裙下三千臣
4楼-- · 2018-12-31 02:12

Simple yet Python 2 & Python 3 compatible way comes with six library:

from six.moves import urllib
urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
查看更多
像晚风撩人
5楼-- · 2018-12-31 02:12

Following are the most commonly used calls for downloading files in python:

  1. urllib.urlretrieve ('url_to_file', file_name)

  2. urllib2.urlopen('url_to_file')

  3. requests.get(url)

  4. wget.download('url', file_name)

Note: urlopen and urlretrieve are found to perform relatively bad with downloading large files (size > 500 MB). requests.get stores the file in-memory until download is complete.

查看更多
公子世无双
6楼-- · 2018-12-31 02:12

urlretrieve and requests.get is simple, however the reality not. I have fetched data for couple sites, including text and images, the above two probably solve most of the tasks. but for a more universal solution I suggest the use of urlopen. As it is included in Python 3 standard library, your code could run on any machine that run Python 3 without pre-installing site-par

import urllib.request
url_request = urllib.request.Request(url, headers=headers)
url_connect = urllib.request.urlopen(url_request)
len_content = url_content.length

#remember to open file in bytes mode
with open(filename, 'wb') as f:
    while True:
        buffer = url_connect.read(buffer_size)
        if not buffer: break

        #an integer value of size of written data
        data_wrote = f.write(buffer)

#you could probably use with-open-as manner
url_connect.close()

This answer provides a solution to HTTP 403 Forbidden when downloading file over http using Python. I have tried only requests and urllib modules, the other module may provide something better, but this is the one I used to solve most of the problems.

查看更多
人间绝色
7楼-- · 2018-12-31 02:12

This may be a little late, But I saw pabloG's code and couldn't help adding a os.system('cls') to make it look AWESOME! Check it out :

    import urllib2,os

    url = "http://download.thinkbroadband.com/10MB.zip"

    file_name = url.split('/')[-1]
    u = urllib2.urlopen(url)
    f = open(file_name, 'wb')
    meta = u.info()
    file_size = int(meta.getheaders("Content-Length")[0])
    print "Downloading: %s Bytes: %s" % (file_name, file_size)
    os.system('cls')
    file_size_dl = 0
    block_sz = 8192
    while True:
        buffer = u.read(block_sz)
        if not buffer:
            break

        file_size_dl += len(buffer)
        f.write(buffer)
        status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
        status = status + chr(8)*(len(status)+1)
        print status,

    f.close()

If running in an environment other than Windows, you will have to use something other then 'cls'. In MAC OS X and Linux it should be 'clear'.

查看更多
登录 后发表回答