I'm trying to program a simple web-crawler using the Requests module, and I would like to know how to disable its -default- keep-alive feauture.
I tried using:
s = requests.session()
s.config['keep_alive'] = False
However, I get an error stating that session object has no attribute 'config', I think it was changed with the new version, but i cannot seem to find how to do it in the official documentation.
The truth is when I run the crawler on a specific website, it only gets five pages at most, and then keeps looping around infinitely, so I thought it has something to do with the keep-alive feature!
PS: is Requests a good module for a web-crawler? is there something more adapted?
Thank you !
This works
s = requests.session()
s.keep_alive = False
Answered in the comments of a similar question.
I am not sure but can you try passing {"Connection": "close"} as HTTP headers when sending a GET request using requests. This will close the connection as soon a server returns a response.
>>> headers = {"Connection": "close"}
>>> r = requests.get('https://example.xcom', headers=headers)
As @praveen suggested it's expected from us to use HTTP/1.1
header Connection: close
to notify the server that the connection should be closed after completion of the response.
Here is how it's described in RFC 2616:
HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after completion of the response. For example,
Connection: close
in either the request or the response header fields indicates that the connection SHOULD NOT be considered `persistent' (section 8.1) after the current request/response is complete.
HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message.