How many times will TCP retransmit

2020-08-20 06:36发布

问题:

In the case of a half open connection where the server crashes (no FIN or RESET sent to client), and the client attempts to send some data on this broken connection, each TCP segment will go un-ACKED. TCP will attempt to retransmit packets after some timeout. How many times will TCP attempt to retransmit before giving up and what happens in this case? How does it inform the operating system that the host is unreachable? Where is this specified in the TCP RFC?

回答1:

If the server program crashes, the kernel will clean up all open sockets appropriately. (Well, appropriate from a TCP point of view; it might violate the application layer protocol, but applications should be prepared for this event.)

If the server kernel crashes and does not come back up, the number and timing of retries depends if the socket were connected yet or not:

   tcp_retries1 (integer; default: 3; since Linux 2.2)
          The number of times TCP will attempt to
          retransmit a packet on an established connection
          normally, without the extra effort of getting
          the network layers involved.  Once we exceed
          this number of retransmits, we first have the
          network layer update the route if possible
          before each new retransmit.  The default is the
          RFC specified minimum of 3.

   tcp_retries2 (integer; default: 15; since Linux 2.2)
          The maximum number of times a TCP packet is
          retransmitted in established state before giving
          up.  The default value is 15, which corresponds
          to a duration of approximately between 13 to 30
          minutes, depending on the retransmission
          timeout.  The RFC 1122 specified minimum limit
          of 100 seconds is typically deemed too short.

(From tcp(7).)

If the server kernel crashes and does come back up, it won't know about any of the sockets, and will RST those follow-on packets, enabling failure much faster.

If any single-point-of-failure routers along the way crash, if they come back up quickly enough, the connection may continue working. This would require that firewalls and routers be stateless, or if they are stateful, have rulesets that allow preexisting connections to continue running. (Potentially unsafe, different firewall admins have different policies about this.)

The failures are returned to the program with errno set to ECONNRESET (at least for send(2)).