Pinterest API - returning 403 on EC2 Instance

2019-03-05 07:06发布

问题:

I'm attempting to retrieve the number of Pins for a given URL. I created this Python script, which takes two separate URLs and prints out the amount of Pins for each. When I run this script on my local machine I'm returned a 200 response containing the Pin count, however, when I run the exact same script on my EC2 instance I'm returned 403 error.

Here is the Python script:

#!/usr/bin/python

import requests

# Pinterest API
pinterest_endpoint = "http://api.pinterest.com/v1/urls/count.json?callback=&url="

# Emulate a SQL Query result (id, url)
results = [(1, "http://allrecipes.com/recipe/easter-nests/detail.aspx"), (2, "http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html")]

# Cycle thru each URL
for url in results:
    # Print URL details
    print url[0]
    print url[1]
    print type(url[0])
    print type(url[1])
    print "Downloading: ", url[1]

    # Create Complete URL
    target_url = pinterest_endpoint + url[1]
    print target_url

    # Hit Pinterest API
    r = requests.get(target_url)
    print r
    print r.text
    # Parse string response
    start = r.text.find('\"count\"')
    end = r.text.find(',', start+1)
    content = len('\"count\"')
    pin_count = int(r.text[(start+content+1):end].strip())
    print pin_count

This is the response I get on my local machine (Ubuntu 12.04):

$ python pin_count.py
1
http://allrecipes.com/recipe/easter-nests/detail.aspx
<type 'int'>
<type 'str'>
Downloading:  http://allrecipes.com/recipe/easter-nests/detail.aspx
http://api.pinterest.com/v1/urls/count.json?callback=&url=http://allrecipes.com/recipe/easter-nests/detail.aspx
<Response [200]>
({"count": 997, "url": "http://allrecipes.com/recipe/easter-nests/detail.aspx"})
997
2
http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html
<type 'int'>
<type 'str'>
Downloading:  http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html
http://api.pinterest.com/v1/urls/count.json?callback=&url=http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html
<Response [200]>
({"count": 993, "url": "http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html"})
993

This is the response I get when I run the same script on my EC2 instance (Ubuntu):

$ python pin_count.py
1
http://allrecipes.com/recipe/easter-nests/detail.aspx
<type 'int'>
<type 'str'>
Downloading:  http://allrecipes.com/recipe/easter-nests/detail.aspx
http://api.pinterest.com/v1/urls/count.json?callback=&url=http://allrecipes.com/recipe/easter-nests/detail.aspx
<Response [403]>
{ "status": 403, "message": "Forbidden" }
Traceback (most recent call last):
  File "cron2.py", line 32, in <module>
    pin_count = int(r.text[(start+content+1):end].strip())
ValueError: invalid literal for int() with base 10: 'us": 403'

I understand why it spits out a ValueError message, what I don't understand is why I'm getting a 403 response when I run the script from my EC2 instance but it works as expected from my local machine.

Any help would be much appreciated!

回答1:

Not an answer, but hopefully this will save someone else an hour trying this approach: Pinterest, unsurprisingly, appears to also be blocking requests from tor exit routers.

I had the same problem with the same endpoint and narrowed it down to EC2 + Pinterest as well. I attempted to circumvent it by routing the request through tor.

class PinterestService(Service):
    service_url = "http://api.pinterest.com/v1/urls/count.json?callback="
    url_param = 'url'

    def get_response(self, url, **params):
        params[self.url_param] = url

        # privoxy listens by default on port 8118
        # on the ec2 privoxy is configured to forward
        # socks5 through tor like so:
        # http://fixitts.com/2012/05/26/installing-tor-and-privoxy-on-ubuntu-server-or-any-other-linux-machine/

        http_proxy  = "socks5://127.0.0.1:8118"

        proxyDict = { 
          "http"  : http_proxy
        }

        return requests.get(self.service_url, params=params, proxies=proxyDict)

I have cycled through numerous exit routers and the response is consistently { "status": 403, "message": "Forbidden" }

For a solve I am going to go through a private http proxy server



回答2:

This question was filed a few years ago, and the current answer I believe is out of date. EC2 now runs the above script with a successful response without the need for a proxy. I came across this question while investigating my own similar issue with Google App Engine.



回答3:

Pinterest is probably blocking requests from IP blocks owned by Amazon, resulting in the 403: Forbidden error. Pinterest doesn't have official support for their API, so (my supposition is) that they are blocking the largest possible sources of commercial usage of their API. You can test this by using an instance from a non-AWS provider.