As shown in my Stack Overflow question Azure webjob not appearing to respect MaxDequeueCount property, I'm troubleshooting a problem where despite my setting of the MaxDequeueCount property to 1, some items are dequeued many times before they're poisoned (in fact some items may never be poisoned at all and just dequeue, fail and are retried and fail endlessly).
The Webjobs SDK handles the retries and poisoning of queue triggered messages automatically and I'm looking for logs that contain details of that handling.
For example, I can see that my function has detected a new queue item by viewing the webjob's log through the SCM at https://myappengine.scm.azurewebsites.net/vfs/data/jobs/continuous/StuffProcessor/job_log.txt
(By the way if I've enabled detailed logging to Azure storage on the web app can I get this same info in a Blob?).
[02/22/2017 01:47:22 > ec8d0f: INFO] Executing: 'StuffProcessor.ProcessQueueMessage' - Reason: 'New queue message detected on 'stuff-processor'.'
[02/22/2017 01:47:26 > ec8d0f: INFO] Executed: 'StuffProcessor.ProcessQueueMessage' (Succeeded)
[02/22/2017 01:47:26 > ec8d0f: INFO] Executing: 'StuffProcessor.ProcessQueueMessage' - Reason: 'New queue message detected on 'stuff-processor'.'
I can also get some information on an item's dequeue count by viewing the logs in the azure-jobs-host-archive
container once I've enabled detailed logging to Azure storage on the web app:
{
"Type": "FunctionCompleted",
"EndTime": "2017-02-22T00:07:40.8133081+00:00",
"Failure": {
"ExceptionType": "Microsoft.Azure.WebJobs.Host.FunctionInvocationException",
"ExceptionDetails": "Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: ItemProcessor.ProcessQueueMessage ---> MyApp.Exceptions.MySpecialAppExceptionType: Exception of type 'MyApp.Exceptions.MySpecialAppExceptionType' was thrown.
},
"ParameterLogs": {},
"FunctionInstanceId": "1ffac7b0-1290-4343-8ee1-2af0d39ae2c9",
"Function": {
"Id": "MyApp.Processors.ItemProcessor.ProcessQueueMessage",
"FullName": "MyApp.Processors.ItemProcessor.ProcessQueueMessage",
"ShortName": "ItemProcessor.ProcessQueueMessage",
"Parameters": [
{
"Type": "QueueTrigger",
"AccountName": "MyStorageAccount",
"QueueName": "stuff-processor",
"Name": "sourceFeedItemQueueItem"
},
{
"Type": "BindingData",
"Name": "dequeueCount"
},
{
"Type": "ParameterDescriptor",
"Name": "logger"
}
]
},
"Arguments": {
"sourceFeedItemQueueItem": "{\"SourceFeedUpdateID\":437530,\"PodcastFeedID\":\"2d48D2sf2\"}",
"dequeueCount": "96",
"logger": null
},
"Reason": "AutomaticTrigger",
"ReasonDetails": "New queue message detected on 'stuff-processor'.",
"StartTime": "2017-02-22T00:07:40.6017341+00:00",
"OutputBlob": {
"ContainerName": "azure-webjobs-hosts",
"BlobName": "output-logs/1ffd3c7b012c043438ed12af0d39ae2c9.txt"
},
"ParameterLogBlob": {
"ContainerName": "azure-webjobs-hosts",
"BlobName": "output-logs/1cf2c1b012sa0d3438ee12daf0d39ae2c9.params.txt"
},
"LogLevel": "Info",
"HostInstanceId": "d1825bdb-d92a-4657-81a4-36253e01ea5e",
"HostDisplayName": "ItemProcessor",
"SharedQueueName": "azure-webjobs-host-490daea03c70316f8aa2509438afe8ef",
"InstanceQueueName": "azure-webjobs-host-d18252sdbd92a4657d1a436253e01ea5e",
"Heartbeat": {
"SharedContainerName": "azure-webjobs-hosts",
"SharedDirectoryName": "heartbeats/490baea03cfdfd0416f8aa25aqr438afe8ef",
"InstanceBlobName": "zd1825bdbdsdgga465781a436q53e01ea5e",
"ExpirationInSeconds": 45
},
"WebJobRunIdentifier": {
"WebSiteName": "myappengine",
"JobType": "Continuous",
"JobName": "ItemProcessor",
"RunId": ""
}
}
What I can't find though are logs which show detail for a particular queue item where processing fails due to an exception and is placed in the poison queue. I'm looking for these in an attempt to further troubleshoot the apparent ignoring of the MaxDequeueCount property. Is this logged?
UPDATE: I found the post Azure WebJobs with Storage Queue and Exceptions and it contained the following screenshot:
That screenshot shows the standard "New queue message detected..." messages (which I see both on Azure and running locally in the emulator) and it also shows "Message has reached MaxDequeueCount of X...Moving message to queue 'xyz-poison'", which I only see locally in the emulator. Does the Azure-based runtime not show this info for some reason? I never see the poisoning-related messages like this when running locally in the console window or via the webjobs dashboard or when running on Azure.
Are you using Azure Storage version 8.* ? It contains a breaking change.
When a queue message object gets inserted into a message queue in Azure Storage 8, the Message Id of the original queue message object is overwritten with the unique message id of the newly inserted message.
This means that it looses its reference to the poison message and can no longer remove it.
https://github.com/Azure/azure-webjobs-sdk/issues/985
As for logging, 2 options:
Create your own QueueProcessorFactory and QueueProcessor - you can override the method that handles poison messages to produce more debug output. There you can also fix the issue altogether if you'd like to stay on Azure Storage 8.*.
Register a trace monitor. https://github.com/Azure/azure-webjobs-sdk-extensions/wiki/Error-Monitoring#tracemonitor
In your previous post, you have the triggered function as follows
Anything you write to the logger should appear in the logs on the WebJobs dashboard. My suggestions are:
Change 'if (item == "exception")' to 'if (item.ToLower().StartsWith("exception")'. Then append a number to the exception message when testing. Log the text of the queue message to the TextWriter so you can be sure that you're looking at the same message you think you are inspecting.
You can get the message's dequeue count as a parameter to a function. See here. Check out the FailAlways method in the example. Log this to the TextWriter as well.
Try running this locally. All output will be written to the console window. See if you get the same behavior.
I'm really interested in finding what's causing this behavior, so make sure you let us know if you crack it!