When initializing a requests' Session
, two HTTPAdapter
will be created and mount to http
and https
.
This is how HTTPAdapter
is defined:
class requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10,
max_retries=0, pool_block=False)
While I understand the meaning of pool_maxsize
(which is the number of session a pool can save), I don't understand what pool_connections
means or what it does. Doc says:
Parameters:
pool_connections – The number of urllib3 connection pools to cache.
But what does it mean "to cache"? And what's the point using multiple connection pools?
I wrote an article about this. pasted it here:
Requests' secret: pool_connections and pool_maxsize
Requests is one of the, if not the most well-known Python third-party library for Python programmers. With its simple API and high performance, people tend to use requests instead of urllib2 provided by standard library for HTTP requests. However, people who use requests every day may not know the internals, and today I want to introduce two of them:
pool_connections
andpool_maxsize
.Let's start with
Session
:It's pretty simple. You probably know requests'
Session
can persists cookie. Cool. But do you knowSession
has amount
method?No? Well, in fact you've already used this method when you initialize a
Session
object:Now comes the interesting part. If you've read Ian Cordasco's article Retries in Requests, you should know that
HTTPAdapter
can be used to provide retry functionality. But what is anHTTPAdapter
really? Quote from doc:If the above documentation confuses you, here's my explanation: what HTTP Adapter does is simply providing different configurations for different requests according to target url. Remember the code above?
It creates two
HTTPAdapter
objects with the default argumentpool_connections=10, pool_maxsize=10, max_retries=0, pool_block=False
, and mount tohttps://
andhttp://
respectively, which means configuration of the firstHTTPAdapter()
will be used if you try to send a request tohttp://xxx
, and the secondHTTPAdapter()
will be used for requests tohttps://xxx
. Thought in this case the two configurations are the same, requests tohttp
andhttps
are still handled separately. We'll see what it means later.As I said, the main purpose of this article is to explain
pool_connections
andpool_maxsize
.First let's look at
pool_connections
. Yesterday I raised a question on stackoverflow cause I'm not sure if my understanding is correct, the answer eliminates my uncertainty. HTTP, as we all know, is based on TCP protocol. An HTTP connection is also a TCP connection, which is identified by a tuple of five values:Say you've established an HTTP/TCP connection with
www.example.com
, assume the server supportsKeep-Alive
, next time you send request towww.example.com/a
orwww.example.com/b
, you could just use the same connection cause none of the five values change. In fact, requests' Session automatically does this for you and will reuse connections as long as it can.The question is, what determines if you can reuse old connection or not? Yes,
pool_connections
!I know, I know, I don't want to brought so many terminologies either, this is the last one, I promise. For easy understanding, one connection pool corresponds to one host, that's what it is.
Here's an example(unrelated lines are ignored):
HTTPAdapter(pool_connections=1)
is mounted tohttps://
, which means only one connection pool persists at a time. After callings.get('https://www.baidu.com')
, the cached connection pool isconnectionpool('https://www.baidu.com')
. Nows.get('https://www.zhihu.com')
came, and the session found that it cannot use the previously cached connection because it's not the same host(one connection pool corresponds to one host, remember?). Therefore the session had to create a new connection pool, or connection if you would like. Sincepool_connections=1
, session cannot hold two connection pools at the same time, thus it abandoned the old one which isconnectionpool('https://www.baidu.com')
and kept the new one which isconnectionpool('https://www.zhihu.com')
. Nextget
is the same. This is why we see threeStarting new HTTPS connection
in logging.What if we set
pool_connections
to 2:Great, now we only created connections twice and saved one connection establishing time.
Finally,
pool_maxsize
.First and foremost, you should be caring about
pool_maxsize
only if you useSession
in a multithreaded environment, like making concurrent requests from multiple threads using the sameSession
.Actually,
pool_maxsize
is an argument for initializing urllib3'sHTTPConnectionPool
, which is exactly the connection pool we mentioned above.HTTPConnectionPool
is a container for a collection of connections to a specific host, andpool_maxsize
is the number of connections to save that can be reused. If you're running your code in one thread, it's neither possible or needed to create multiple connections to the same host, cause requests library is blocking, so that HTTP request are always sent one after another.Things are different if there are multiple threads.
See? It established two connections for the same host
www.zhihu.com
, like I said, this can only happen in a multithreaded environment. In this case, we create a connectionpool withpool_maxsize=2
, and there're no more than two connections at the same time, so it's enough. We can see that requests fromt3
andt4
did not create new connections, they reused the old ones.What if there's not enough size?
Now,
pool_maxsize=1
,warning came as expected:We can also noticed that since only one connection can be saved in this pool, a new connection is created again for
t3
ort4
. Obviously this is very inefficient. That's why in urllib3's documentation it says:Last but not least,
HTTPAdapter
instances mounted to different prefix are independent.The above code is easy to understand so I don't explain.
I guess that's all. Hope this article help you understand requests better. BTW I created a gist here which contains all of the testing code used in this article. Just download and play with it :)
Appendix
Session
'smount
method will ensure the longest prefix gets matched first. Its implementation is pretty interesting so I posted it here.Note that
self.adapters
is anOrderedDict
.Thanks to @laike9m for the existing Q&A and article, but the existing answers fail to mention the subtleties of
pool_maxsize
and its relation to multithreaded code.Summary
pool_connections
is number of connections that can be kept alive in the pool at a given time from one (host, port, scheme) endpoint. If you want to keep around a max ofn
open TCP connections in a pool for reuse with aSession
, you wantpool_connections=n
.pool_maxsize
is effectively irrelevant for users ofrequests
due to the default value forpool_block
(inrequests.adapters.HTTPAdapter
) beingFalse
rather thanTrue
Detail
As correctly pointed out here,
pool_connections
is the maximum number of open connections given the adapter's prefix. It's best illustrated through example:Above, the max number of connections is 1; it is
(github.com, 443, https)
. If you want to request a resource from a new (host, port, scheme) triple, theSession
internally will dump the existing connection to make room for a new one:You can up the number to
pool_connections=2
, then cycle between 3 unique host combinations, and you'll see the same thing in play. (One other thing to note is that the session will retain and send back cookies in this same way.)Now for
pool_maxsize
, which is passed tourllib3.poolmanager.PoolManager
and ultimately tourllib3.connectionpool.HTTPSConnectionPool
. The docstring for maxsize is:Incidentally,
block=False
is the default forHTTPAdapter
, even though the default isTrue
forHTTPConnectionPool
. This implies thatpool_maxsize
has little to no effect forHTTPAdapter
.Furthermore,
requests.Session()
is not thread safe; you shouldn't use the samesession
instance from multiple threads. (See here and here.) If you really want to, the safer way to go would be to lend each thread its own localized session instance, then use that session to make requests over multiple URLs, viathreading.local()
:Requests uses urllib3 to manage its connections and other features.
Re-using connections is an important factor in keeping recurring HTTP requests performant. The urllib3 README explains:
To answer your question, "pool_maxsize" is the number of connections to keep around per host (this is useful for multi-threaded applications), whereas "pool_connections" is the number of host-pools to keep around. For example, if you're connecting to 100 different hosts, and
pool_connections=10
, then only the latest 10 hosts' connections will be re-used.