I have a small utility that I use to download a MP3 from a website on a schedule and then builds/updates a podcast XML file which I've obviously added to iTunes.
The text processing that creates/updates the XML file is written in Python. I use wget inside a Windows .bat
file to download the actual MP3 however. I would prefer to have the entire utility written in Python though.
I struggled though to find a way to actually down load the file in Python, thus why I resorted to wget
.
So, how do I download the file using Python?
In Python 2, use urllib2 which comes with the standard library.
This is the most basic way to use the library, minus any error handling. You can also do more complex stuff such as changing headers. The documentation can be found here.
If speed matters to you, I made a small performance test for the modules
urllib
andwget
, and regardingwget
I tried once with status bar and once without. I took three different 500MB files to test with (different files- to eliminate the chance that there is some caching going on under the hood). Tested on debian machine, with python2.First, these are the results (they are similar in different runs):
The way I performed the test is using "profile" decorator. This is the full code:
urllib
seems to be the fastestSimple yet
Python 2 & Python 3
compatible way comes withsix
library:Following are the most commonly used calls for downloading files in python:
urllib.urlretrieve ('url_to_file', file_name)
urllib2.urlopen('url_to_file')
requests.get(url)
wget.download('url', file_name)
Note:
urlopen
andurlretrieve
are found to perform relatively bad with downloading large files (size > 500 MB).requests.get
stores the file in-memory until download is complete.urlretrieve and requests.get is simple, however the reality not. I have fetched data for couple sites, including text and images, the above two probably solve most of the tasks. but for a more universal solution I suggest the use of urlopen. As it is included in Python 3 standard library, your code could run on any machine that run Python 3 without pre-installing site-par
This answer provides a solution to HTTP 403 Forbidden when downloading file over http using Python. I have tried only requests and urllib modules, the other module may provide something better, but this is the one I used to solve most of the problems.
This may be a little late, But I saw pabloG's code and couldn't help adding a os.system('cls') to make it look AWESOME! Check it out :
If running in an environment other than Windows, you will have to use something other then 'cls'. In MAC OS X and Linux it should be 'clear'.