Graphing slow counters with prometheus and grafana

2019-03-11 10:53发布

问题:

We graph fast counters with sum(rate(my_counter_total[1m])) or with sum(irate(my_counter_total[20s])). Where the second one is preferrable if you can always expect changes within the last couple of seconds.

But how do you graph slow counters where you only have some increments every couple of minutes of even hours? Having values like 0.0013232/s is not very human friendly.

Let's say I want to graph how many users sign up to our service (we expect a couple of signups per hour). What's a reasonable query?

We currently use the following to graph that in grafana:

  • Query: 3600 * sum(rate(signup_total[1h]))
  • Step: 3600s
  • Resolution: 1/1

Is this reasonable?

I'm still trying to understand how all those parameters play together to draw a graph. Can someone explain how the range selector ([10m]), the rate() and the irate() functions, the Step and Resolution settings in grafana influence each other?

回答1:

That's a correct way to do it. You can also use increase() which is syntactic sugar for using rate() that way.

Can someone explain how the range selector

This is only used by Prometheus, and indicates what data to work over.

the Step and Resolution settings in grafana influence each other?

This is used on the Grafana side, it affects how many time slices it'll request from Prometheus.

These settings do not directly influence each other. However the resolution should work out to be smaller than the range, or you'll be undersampling and miss information.