可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to make a few microservices more resilient and retrying certain types of HTTP requests would help with that.

Retrying timeouts will give clients a terribly slow experience, so I don't intend to retry in this case. Retrying 400s doesn't help because a bad request will remain a bad request a few milliseconds later.

I imagine there are other reasons to not retry a few other types of errors, but which errors and why?

回答1:

There are some errors that should not be retried because they seem permanent:

400 Bad Request
401 Unauthorized
402 Payment Required
403 Forbidden
405 Method Not Allowed
406 Not Acceptable
407 Proxy Authentication Required
409 Conflict - it depends
410 Gone
411 Length Required
412 Precondition Failed
413 Payload Too Large
414 URI Too Long
415 Unsupported Media Type
416 Range Not Satisfiable
417 Expectation Failed
418 I'm a teapot - not sure about this one
421 Misdirected Request
422 Unprocessable Entity
423 Locked - it depends on how long a resource is locked in average (?)
424 Failed Dependency
426 Upgrade Required - can the client be upgraded automatically?
428 Precondition Required - I don't thing that the precondition can be fulfiled the second time without retring from the beginning of the whole process but it depends
429 Too Many Requests - it depends but it should not be retried to fast
431 Request Header Fields TooLarge
451 Unavailable For Legal Reasons

So, most of the 4** Client errors should not be retried.

The 5** Servers errors that should not be retried:

500 Internal Server Error
501 Not Implemented
502 Bad Gateway - I saw used for temporary errors so it depends
505 HTTP Version Not Supported
506 Variant Also Negotiates
507 Insufficient Storage
508 Loop Detected
510 Not Extended
511 Network Authentication Required

However, in order to make the microservices more resilient you should use the Circuit breaker pattern and fail fast when the upstream is down.

回答2:

4xx codes mean that an error has been made at the caller's side. That could be a bad URL, bad authentication credentials or anything that indicates it was a bad request. Therefore, without fixing that problem, there isn't an use of retry. The error is in caller's domain and caller should fix it instead of hoping that it will fix itself.

There are exceptions. Let's imagine the service is being redeployed or restarted. At that instance, there is no endpoint registered and hence will send 4xx http code. However, a moment later, the server could be available. A retry might therefore seem beneficial.

A deeper analysis will indicate that a service, when restarted, should be a rolling restart to prevent outage. Therefore, the previous argument no longer holds true. However, if your environment/ecosystem does not follow this practice and you believe client side reported error (4xx codes) are worth retry due to aforementioned reason, then you may choose to do so; but mature systems won't do that due to no benefits perceived and losing the fail fast ability.

5xx error codes should be retried as those are service errors. They could be short term (overflowing threads, dependent service refusing connections) or long term (system defect, dependent system outage, infrastructure unavailable). Sometimes, services reply back with the information (often headers) whether this is permanent or temporary; and sometimes a time parameter as to when to retry. Based on these parameters, callers can choose to retry or not.

1xx, 2xx and 3xx codes need not be retried for obvious reasons.