I am testing Azure Application Insights alert functionality. It seems to be either buggy or I don't know how to use it.
If I create a new alert, based on the metric 'Server Exceptions', it seems to work once then never again. Once it fires, it seems to go into a state of 'Active' where there is an orange triangle with an !. See the image below. I created a new one, that I haven't triggered, and as can be seen in the image it has a green circle with a tick.
This sort of implies to me that an alert won't fire again until one 'acknowledges' the alert, which is not a bad idea, but I can't see how to do that.
Edit :
I have just tried to use the 'Exception Rate' as suggested, but I think the minimum threshold to fire the alert would be an average of 1 exception per second over a 5 minute period.
I must say it seems strange that my use-case isn't handled. I have a light weight Web API service that is so simple it should never fail but it could, and as a result if an exception occurs I want to receive an alert straight away.
Alert is supposed to resolve and state is supposed to get back to green when the condition of the alert is no longer fulfilled.
This is exceptionally hard to achieve with "Count" metrics because they go up and up and almost never down. It means that, once fired, the alert won't resolve because the value of the metric stays over the threshold all the time.
You can try to set an alert on the "Rate" metric instead and you should see that the state is returning to green when the "Rate" is within the limits you set.
This is now fixed. Please let us know if you see any issues. Some things to keep in mind:
- Alert rules are evaluated on a sliding window: an alert would trigger/resolve based on how the condition evaluates on a sliding window from the instant a sample arrives.
- A caveat to the above for exception count based alert rules: we will resolve an alert if there are no exceptions reported for the time window configured in the rule.
- Note: this is different from metrics based rules – lack of data does not result in the alert being resolved for those.
"Server exception" metric works as OP expects now in 2018. My use case below:
For the goal of getting an email whenever an Exception happened.
Use "Server exception" metric.
That metric is smart enough to auto-resolve after waiting the period's length of time after the initial alert, if the error has not occurred again.
So you'll have the initial "Alert", then 5 minutes later of no Exceptions, it returns a "Healthy" state.
And since it auto-resolved, if the error happens again tomorrow it will do the "Alert" again.
Note this was using App Insights with a Function App. The Function App Failure metric had problems and wasn't reliable for this (Azure kept logging 0.2 Exception/s and thinking that was over the 1 in 5 min threshold...)