Why is request_time much larger than upstream_resp

I am trying to improve the performance of a web app. Profiling the app itself, I found its response time are quite acceptable (100ms-200ms), but when I use ApacheBench to test the app, the response time sometimes exceeds 1 second. When I looked closely at the logs, I found a big discrepancy between request_time and upstream_response_time occasionally:

"GET /wsq/p/12 HTTP/1.0" 200 114081 "-" "ApacheBench/2.3" 0.940 0.286
"GET /wsq/p/31 HTTP/1.0" 200 114081 "-" "ApacheBench/2.3" 0.200 0.086

The upstream_response_time is quite close to my profiling in the web app, but request_time is close to one second for the first request.

What could cause this discrepancy?

I understand request_time is recorded from the first byte received to last response byte sent, it can be affected by network condition and client problem. I am wondering what should I do to reduce the average request_time as much as possible?

A high request_time may be due to a client with a slow connection, for which you can't do much about. Thus, a high request_time does not necessarily represent the performance of your server and/or application.

You really should not spend too much time on request_time when profiling but instead measure things like the application's response time (ie. upstream_response_time).

That said, there are some things which you are able to do and may affect the request_time. Some of them are the following:

Move your server on a high-speed network
Move your server near the client
Disable the Nagle's algorithm
Tune the server's TCP stack (see this article). However these won't necessarily make a big difference since the kernel does a good job of tuning them for you.