I am trying to web scrape, here is my code.
For some reason I am getting HTTP Error 400: Bad Request, I have never had this before.
Any ideas?
Here is my code:
import urllib.request
import re
url = ('https://www.myvue.com/whats-on')
req = urllib.request.Request(url, headers={'User Agent': 'Mozilla/5.0'})
def main():
html_page = urllib.request.urlopen(req).read()
content=html_page.decode(errors='ignore', encoding='utf-8')
headings = re.findall('<th scope="col" abbr="(.*?)">', content)
print(headings)
main()
Fix your header:
It's
User-Agent
, notUser Agent
.Additionally, I would recommend switching over to the
requests
module.This is the equivalent of three lines of
urllib
and much more readable. In addition, it automatically decodes the content for you.