-->

How is “app server” time related to “browser time”

2019-03-22 08:51发布

问题:

I'm monitoring a PHP app with NewRelic, and I'm very confused about some of the numbers shown in the overview of my application.

My app consists of a PHP webapp, that serves pages to web browsers on one side (obviously :), and performs requests to a Java backend on the other side:

Browser <--> PHP Webapp --> Java Backend

I know for a fact that some of the Java backend requests can take up to 15 seconds to complete, leading to the complete webapp from the point of view of the browser taking that much time.

In the overview panel of newrelic (APM > My App > Monitoring > Overview), it says that my "app server" time is in average 1560 ms, and that my "browser time" is 5.63 secs (I have enabled browser monitoring). Furthermore, the "Transactions" section shows transactions taking up to 11.6 secs to complete.

So.... how do all this different time measurements relate between them? To summarize, I have:

  • PHP "app server" time: 1560 ms
  • PHP app "browser time": 5.63 secs
  • "Transactions" time: 11.6 secs

How can I make sense of this numbers? I would have expected for the "browser time" to be the highest one (since it oncludes all other, both PHP and Transactions processing). Do they add up in some way? Are some of them the breakdown of the others?

Note: I'm aware that in newrelic it's all about average times relative to the time window being analyzed, but still, this doesn't make sense to me.

Thanks!

回答1:

Well, I finally figured this out :) The key concept I was missing here was "percentiles". Let me explain a little bit.

In my question, I mentioned I was getting average response times of 1560ms, which didn't seemed to make sense given the fact that our backend always has to process for about 15 secs to produce a response. The following picture is what I'm getting in the "overview" of my webapp.

As you can see, average time responses don't seem to be that bad. However, I'm also seeing Transactions that take up to 15 secs.

Following, if you expand the "Web Transactions response time" selector, and select the percentage sign ("%"), you will get the "Percentiles" graph. Mine is as follows:

In this new graph:

  • The green line represents the average response time, which corresponds to the green area of the first graph. Here we see that in fact it states transactions take an average of under 2 secs to complete. So far so good.
  • The orange-ish line, that corresponds to the "95%". This is the key to understanding how all this numbers come together. This "95%" corresponds to the "95th percentile" of your requests. This means that 95% of your requests take less than this time. But of course it also means 5% of your requests are taking more than that!
  • The blue line, corresponding to the "99%" or "99th percentile" of your requests, this meaning that 99% of your requests are taking less than this line, but again, 1% is taking more.
  • The red line, corresponding to the "median" which if fact is a synonym for "50%" or "50th percentile". A this point you can imagine what this is: 50% of your requests are taking less than this time, and another 50% is taking more (hence the name "median"). Note that is interesting this measure is considerably different from the "average" notion, because average sums up all times and divides by the total number of transactions, thus hiding in the high volume of the sample, those transactions that are on the extremes of the sampled times.

Now, it all begins to make sense. My average requests are in fact taking no more than 2 secs. But I have so many requests that are extremely fast (those below the red line), that those taking the incredible amount of time of 15 secs are not noticeable in the average. Those are evident only when you look at the long-tail of your sampled requests, ie. the 95th and 99th percentiles.

To wrap it up, this can be confirmed selecting the "histogram" option in the graph. Mine is as follows:

Notice the vast majority of request take under 200ms, but we have also a 8.29% of transactions taking more than 7 secs to complete (and if we could scroll to the right of the histogram, we would find that in fact the request taking more than 15 secs are in the last 5% and 1%, because of the percentiles analysis we did before).

(This article pointed me in the right direction: https://blog.newrelic.com/2013/10/23/histograms-percentiles-new-relic-style/)

This had me disoriented for a long time, hope it helps someone!