I'm writing a small script in Python 2.7.3 with GRequests and lxml that will allow me to gather some collectible card prices from various websites and compare them. Problem is one of the websites limits the number of requests and sends back HTTP error 429 if I exceed it.
Is there a way to add throttling the number of requests in GRequestes so that I don't exceed the number of requests per second I specify? Also - how can I make GRequestes retry after some time if HTTP 429 occurs?
On a side note - their limit is ridiculously low. Something like 8 requests per 15 seconds. I breached it with my browser on multiple occasions just refreshing the page waiting for price changes.
Take a look at this for automatic requests throttling: https://pypi.python.org/pypi/RequestsThrottler/0.2.2
You can set both a fixed amount of delay between each request or set a number of requests to send in a fixed amount of seconds (which is basically the same thing):
where the function
multi_submit
returns a list ofThrottledRequest
(see doc: link at the end).You can then access to the responses:
Alternatively you can achieve the same by specifying the number or requests to send in a fixed amount of time (e.g. 15 requests every 60 seconds):
Both solutions can be implemented without the usage of the
with
statement:For more details: http://pythonhosted.org/RequestsThrottler/index.html
Doesn't look like there's any simple mechanism for handling this build in to the requests or grequests code. The only hook that seems to be around is for responses.
Here's a super hacky work-around to at least prove it's possible - I modified grequests to keep a list of the time when a request was issued and sleep the creation of the AsyncRequest until the requests per second were below the maximum.
You can use grequests.imap() to watch this interactively
I wish there was a more elegant solution, but so far I can't find one. Looked around in sessions and adapters. Maybe the poolmanager could be augmented instead?
Also, I wouldn't put this code in production - the 'q' list never gets trimmed and would eventually get pretty big. Plus, I don't know if it's actually working as advertised. It just looks like it is when I look at the console output.
Ugh. Just looking at this code I can tell it's 3am. Time to goto bed.
I had a similar problem. Here's my solution. In your case, I would do:
Assuming you have multiple domains you're culling from, I would setup a dictionary mapping
(domain, delay)
so you don't hit your rate limits.This code assumes you're going to use gevent and monkey patch.
Going to answer my own question since I had to figure this by myself and there seems to be very little info on this going around.
The idea is as follows. Every request object used with GRequests can take a session object as a parameter when created. Session objects on the other hand can have HTTP adapters mounted that are used when making requests. By creating our own adapter we can intercept requests and rate-limit them in way we find best for our application. In my case I ended up with the code below.
Object used for throttling:
HTTP adapter:
Setup: