I want to get the content from the below website. If I use a browser like Firefox or Chrome I could get the real website page I want, but if I use the Python requests package (or wget
command) to get it, it returns a totally different HTML page. I thought the developer of the website had made some blocks for this, so the question is:
How do I fake a browser visit by using python requests or command wget?
if this question is still valid
I used fake UserAgent
How to use:
outPut:
Provide a
User-Agent
header:FYI, here is a list of User-Agent strings for different browsers:
As a side note, there is a pretty useful third-party package called fake-useragent that provides a nice abstraction layer over user agents:
Demo:
Try doing this, using firefox as fake user agent (moreover, it's a good startup script for web scraping with the use of cookies):
USAGE:
The root of the answer is that the person asking the question needs to have a JavaScript interpreter to get what they are after. What I have found is I am able to get all of the information I wanted on a website in json before it was interpreted by JavaScript. This has saved me a ton of time in what would be parsing html hoping each webpage is in the same format.
So when you get a response from a website using requests really look at the html/text because you might find the javascripts JSON in the footer ready to be parsed.