A specific site is returning a different response

2019-03-03 10:21发布

问题:

I am trying to access a specific site using python, and no matter which lib I use I just can't seem to access it.

I have tried Selenium+PhantomJS, I have tried requests and urllib.

Whenever I try to access the site from the browser I get a json file, and whenever I try to access it from a python script I get an html file (which has a huge minified script inside it)

I suspect this site is detecting I'm sending the request headlessly and is blocking my requests, but I can't figure out how.

The site address is: http://www.yesplanet.co.il/presentationsJSON

I would very much appreciate if anyone can point me in the right direction. Thanks!

EDIT: Here's my selenium code:

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("http://www.yesplanet.co.il/presentationsJSON")
source = driver.page_source

At this point I print the source and see it is not what I expected.

Here is a requests implementation that also does not work:

import requests
res = requests.get("http://www.yesplanet.co.il/presentationsJSON")
source = res.content

The same happens here..

回答1:

It works for me if I set a bunch of headers including sending a cookie.

curl -H "Cookie:rbzid=d29SMXE1Rktrdm5kS2x0YW5EdVZwUzNpYVhWdUlJSndlVzEvUU9vWG5OU2dRSVNnWTc3TWYwaHN4V2REVGJyNFBMSFl1bXErMGFLNXNtUGxVb0ZwS3dVRDRhajEwczFMMmE3cUc1blBmaTEzeFZFWGhrbHgrUXhNeHRhZnhWNjBib1pTenM5bjFvOUhVRVoxOTNGRHBYQXQwVzVsYXdSSXliME5LeUZjU0Rhb2tHa09ycUNVYmJyOUVjMERJN3daaUlFUGhwUHpvT0dDblcwU0wwMEM3NlJZRGw1K1pXZ2NKNkJRTWhvNUtaZz1AQEAxOTVAQEAtNjY2NjY2NjYwNjA-" -H "Accept-Language: en-US,en;q=0.8,ja;q=0.6" -H "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" http://www.yesplanet.co.il/presentationsJSON

Not sure which other headers are important

I looked at what headers chrome was sending by checking the network panel i the dev tools

From that I can also see chrome made 2 requests