I wanted to write a piece of code like the following:
from bs4 import BeautifulSoup
import urllib2
url = 'http://www.thefamouspeople.com/singers.php'
html = urllib2.urlopen(url)
soup = BeautifulSoup(html)
But I found that I have to install urllib3
package now.
Moreover, I couldn't find any tutorial or example to understand how to rewrite the above code, for example, urllib3
does not have urlopen
.
Any explanation or example, please?!
P/S: I'm using python 3.4.
urllib3 is a different library from urllib and urllib2. It has lots of additional features to the urllibs in the standard library, if you need them, things like re-using connections. The documentation is here: https://urllib3.readthedocs.org/
If you'd like to use urllib3, you'll need to pip install urllib3
. A basic example looks like this:
from bs4 import BeautifulSoup
import urllib3
http = urllib3.PoolManager()
url = 'http://www.thefamouspeople.com/singers.php'
response = http.request('GET', url)
soup = BeautifulSoup(response.data)
You do not have to install urllib3
. You can choose any HTTP-request-making library that fits your needs and feed the response to BeautifulSoup
. The choice is though usually requests
because of the rich feature set and convenient API. You can install requests
by entering pip install requests
in the command line. Here is a basic example:
from bs4 import BeautifulSoup
import requests
url = "url"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
The new urllib3 library has a nice documentation here
In order to get your desired result you shuld follow that:
Import urllib3
from bs4 import BeautifulSoup
url = 'http://www.thefamouspeople.com/singers.php'
http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))
The "decode utf-8" part is optional. It worked without it when i tried, but i posted the option anyway.
Source: User Guide