Diagnosing dropped notifications from Azure Notifi

2019-02-22 03:25发布

问题:

We have a (mostly) successful implementation of push notifications to iOS and Android devices through Azure Notification Hubs.

The problem is that some of the iOS devices are apparently never receiving notifications that are sent by Azure Notification Hubs.

We use templates and tags to direct the messages to the appropriate devices. The tags are interest topics, and never user-specific, so we're expecting one notification for a tag to be pushed to all devices subscribed to that tag.

The Android devices seem to receive their notifications flawlessly, but the iOS devices are not consistent. Most of them work. A couple do not.

We are well aware that push notifications are delivered with best effort and have no guarantee of reliability, but our limited testing has revealed more devices which consistently fail to receive push notifications than seems unreasonable (more than two failures from about a dozen devices).

Here's the setup:

We have a simple C# routine in the back end which connects to Azure Notification Hubs and sends notifications to Azure:

var outcome = await hub.SendTemplateNotificationAsync(properties, tag);

We have used the GetAllRegistrationsAsync method to make sure that every device we are checking has successfully registered and is using the correct template. Every device is registered, all the templates are correct.

We are not in "test mode"; the enableTestSend parameter of NotificationHubClient.CreateClientFromConnectionString is set to False.

Troubleshooting:

When we send the notification out, most devices receive the notification and, in the specific case we're testing, update the badge counter with the correct number.

However, a couple of devices do not seem to get the notification. One of the devices did get the notification after we rebooted the device, but after that it stopped.

Using the above mentioned GetAllRegistrationsAsync method, we have verified that the problem devices are correctly registered on Azure and have the correct tags and templates.

We were able to determine the device tokens of the problem devices from the Azure registrations. We used a PHP script which communicates directly with APNS to send a notification just to the problem devices using their device tokens. Every time, the device receives this direct-send notification. It's only the notifications from Azure which are unreliable.

When we examine the Azure Notification Hub Monitor page, we see these metrics for the past 24 hours:

  • 967 APNS Successful Notifications
  • 3 APNS Bad Channel Errors
  • 2 APNS Expired Channel Errors
  • 4 APNS Errors

... and no other errors reported for APNS or for Azure in general. The failure rate we're seeing should have produced an error count over 20.

We have not been able to determine which device tokens were responsible for the errors; is there a way to get this information from Azure?

We're at a loss to explain why we can send notifications directly to these devices over APNS itself, but not through Azure, and why it is that Azure doesn't report more errors than it does.

Any suggestions or insights?

回答1:

It's quite possible that you have some sandbox device tokens in your database (I'm not sure if the device tokens are stored in your server or in Azure Notification Hub). When trying to send a notification with a sandbox device token to the production push environment, an InvalidToken error is returned by Apple, and the connection is closed.

Very often, by the time the server that sends push notifications to Apple's APN server gets the error response, it has already sent many more notifications (possibly with valid tokens), and all of which are discarded by Apple. At this point, new notifications are accepted by Apple only after a new connection with APNS is established, so messages that were sent after the invalid token to the old connection need to be resent. It is possible Azure don't handle this resending correctly.

As you said, the Azure Notification Hub Monitor page shows a few errors. I suspect that 3 APNS Bad Channel Errors means invalid device tokens. I don't know how many invalid device tokens you actually have in the DB, but even one can cause many notifications with valid tokens not to be accepted by Apple.

The best solution is to test all the device tokens in the DB and figure out the ones that are invalid and delete them.