This is related to this question. I was trying to query the Glassdoor public API using the parameters documented, but kept getting a 403 Forbidden response. To make sure that the query parameters were being used to create the URL correctly, I took the composed query URL and tried it in my browser and it worked.
Working backwards from the query that my browser was making, I managed to figure out that the user agent needs to not only be a parameter in the URL, but also needs to be passed in the header.
So putting this all together, here is code that will query the Glassdoor public API succcessfully:
import urllib.request as request
import requests
import json
from collections import OrderedDict
# authentication information & other request parameters
params_gd = OrderedDict({
"v": "1",
"format": "json",
"t.p": "xxxxxx",
"t.k": "yyyyyyyy",
"action": "employers",
"employerID": "11111",
# programmatically get the IP of the machine
"userip": json.loads(request.urlopen("http://ip.jsontest.com/").read().decode('utf-8'))['ip'],
"useragent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
})
# construct the URL from parameters
basepath_gd = 'http://api.glassdoor.com/api/api.htm'
# request the API
response_gd = requests.get(basepath_gd,
params=params_gd,
headers={
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
})
# check the response code (should be 200) & the content
response_gd
response_gd.content
My question is -- why does the User-Agent
need to be specified in the query header when it is already a part of the URL parameters? Shouldn't the query work without the user agent header?