How do I fix a HTTP Error 400: Bad Request?

2019-05-31 05:09发布

问题:

I am trying to web scrape, here is my code.

For some reason I am getting HTTP Error 400: Bad Request, I have never had this before.

Any ideas?

Here is my code:

import urllib.request
import re

url = ('https://www.myvue.com/whats-on')

req = urllib.request.Request(url, headers={'User Agent': 'Mozilla/5.0'})

def main():

    html_page = urllib.request.urlopen(req).read()

    content=html_page.decode(errors='ignore', encoding='utf-8')

    headings = re.findall('<th scope="col" abbr="(.*?)">', content)

    print(headings)

main()

回答1:

Fix your header:

req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})

It's User-Agent, not User Agent.


Additionally, I would recommend switching over to the requests module.

html_page = requests.get(url, {'User-Agent': 'Mozilla/5.0'}).text

This is the equivalent of three lines of urllib and much more readable. In addition, it automatically decodes the content for you.