I'm attempting to retrieve the number of Pins for a given URL. I created this Python script, which takes two separate URLs and prints out the amount of Pins for each. When I run this script on my local machine I'm returned a 200 response containing the Pin count, however, when I run the exact same script on my EC2 instance I'm returned 403 error.
Here is the Python script:
#!/usr/bin/python
import requests
# Pinterest API
pinterest_endpoint = "http://api.pinterest.com/v1/urls/count.json?callback=&url="
# Emulate a SQL Query result (id, url)
results = [(1, "http://allrecipes.com/recipe/easter-nests/detail.aspx"), (2, "http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html")]
# Cycle thru each URL
for url in results:
# Print URL details
print url[0]
print url[1]
print type(url[0])
print type(url[1])
print "Downloading: ", url[1]
# Create Complete URL
target_url = pinterest_endpoint + url[1]
print target_url
# Hit Pinterest API
r = requests.get(target_url)
print r
print r.text
# Parse string response
start = r.text.find('\"count\"')
end = r.text.find(',', start+1)
content = len('\"count\"')
pin_count = int(r.text[(start+content+1):end].strip())
print pin_count
This is the response I get on my local machine (Ubuntu 12.04):
$ python pin_count.py
1
http://allrecipes.com/recipe/easter-nests/detail.aspx
<type 'int'>
<type 'str'>
Downloading: http://allrecipes.com/recipe/easter-nests/detail.aspx
http://api.pinterest.com/v1/urls/count.json?callback=&url=http://allrecipes.com/recipe/easter-nests/detail.aspx
<Response [200]>
({"count": 997, "url": "http://allrecipes.com/recipe/easter-nests/detail.aspx"})
997
2
http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html
<type 'int'>
<type 'str'>
Downloading: http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html
http://api.pinterest.com/v1/urls/count.json?callback=&url=http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html
<Response [200]>
({"count": 993, "url": "http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html"})
993
This is the response I get when I run the same script on my EC2 instance (Ubuntu):
$ python pin_count.py
1
http://allrecipes.com/recipe/easter-nests/detail.aspx
<type 'int'>
<type 'str'>
Downloading: http://allrecipes.com/recipe/easter-nests/detail.aspx
http://api.pinterest.com/v1/urls/count.json?callback=&url=http://allrecipes.com/recipe/easter-nests/detail.aspx
<Response [403]>
{ "status": 403, "message": "Forbidden" }
Traceback (most recent call last):
File "cron2.py", line 32, in <module>
pin_count = int(r.text[(start+content+1):end].strip())
ValueError: invalid literal for int() with base 10: 'us": 403'
I understand why it spits out a ValueError message, what I don't understand is why I'm getting a 403 response when I run the script from my EC2 instance but it works as expected from my local machine.
Any help would be much appreciated!
This question was filed a few years ago, and the current answer I believe is out of date. EC2 now runs the above script with a successful response without the need for a proxy. I came across this question while investigating my own similar issue with Google App Engine.
Pinterest is probably blocking requests from IP blocks owned by Amazon, resulting in the 403: Forbidden error. Pinterest doesn't have official support for their API, so (my supposition is) that they are blocking the largest possible sources of commercial usage of their API. You can test this by using an instance from a non-AWS provider.
Not an answer, but hopefully this will save someone else an hour trying this approach: Pinterest, unsurprisingly, appears to also be blocking requests from tor exit routers.
I had the same problem with the same endpoint and narrowed it down to EC2 + Pinterest as well. I attempted to circumvent it by routing the request through tor.
I have cycled through numerous exit routers and the response is consistently
{ "status": 403, "message": "Forbidden" }
For a solve I am going to go through a private http proxy server