We are processing messages that comes in periodically. We use codahale dropwizard metrics' "Timer" for measuring the time it takes to process them.
I found someone had the same issue here:
"problem with exponentially decaying reservoir is that if no new data comes in, it will keep on giving the same numbers all the times. For example, let say you update a timer with 5 and 7 (then don't put anything at all) , then no matter when you see (even after x hours), timer will still show the average to be 6 which is not representative of last 5 mins by any means.
so, it works only if data is arriving all the time."
As you can see with the dark blue line:
But there's no suggestion to solve it. And they say it's not going to be implemented: https://github.com/dropwizard/metrics/issues/399
How can I reset these timers properly or how should I visualize it so it's not confusing?
Note: It is too long for comment.
Using the SlidingTimeWindowReservoir
will cover most of the use-cases.
But as pointed out in this comment there could be a problem depending on the number of events:
it keeps all the measurements in the window in-memory which becomes unacceptable at large number of events
Could we do better? Let's continue searching. If we are lucky we will find this blog post. It describes exactly your kind of problem. There is a link to their simple dirty solution. Also a suggestion for using HdrHistogram.
Also on the metrics mailing list there are several messages about exactly this problem.
For example and point to Marshall Pierce/hdrhistogram-metrics-reservoir. What is HdrHistogram
and why use it to measure latencies check the project description.
And finally after some more digging you could find also vladimir-bukhtoyarov/metrics-core-hdr
project. It's using HdrHistogram
also.
So there are two similar libraries that use the same data structure and claim to solve the problem case you have hit.