We're trying to tune an application that accepts messages via TCP and also uses TCP for some of its internal messaging. While load testing, we noticed that response time degrades significantly (and then stops altogether) as more simultaneous requests are made to the system. During this time, we see a lot of TCP connections in TIME_WAIT
status and someone suggested lowering the TIME_WAIT
environment variable from it's default 60 seconds to 30.
From what I understand, the TIME_WAIT
setting essentially sets the time a TCP resource is made available to the system again after the connection is closed.
I'm not a "network guy" and know very little about these things. I need a lot of what's in that linked post, but "dumbed down" a little.
- I think I understand why the
TIME_WAIT
value can't be set to 0, but can it safely be set to 5? What about 10? What determines a "safe" setting for this value? - Why is the default for this value 60? I'm guessing that people a lot smarter than me had good reason for selecting this as a reasonable default.
- What else should I know about the potential risks and benefits of overriding this value?
setting the tcp_reuse is more useful than changing time_wait, as long as you have the parameter (kernels 3.2 and above, unfortunately that disqualifies all versions of RHEL and XenServer).
Dropping the value, particularly for VPN connected users, can result in constant recreation of proxy tunnels on the outbound connection. With the default Netscaler (XenServer) config, which is lower than the default Linux config, Chrome will sometimes have to recreate the proxy tunnel up to a dozen times to retrieve one web page. Applications that don't retry, such as Maven and Eclipse P2, simply fail.
The original motive for the parameter (avoid duplication) was made redundant by a TCP RFC that specifies timestamp inclusion on all TCP requests.