Simulate 10,000 Azure IoT Hub Device connections f

2019-01-27 04:04发布

问题:

We are developing a .Net Core service that shall be hosted in Azure Service Fabric. This SF Service needs to interact with 10,000 devices registered in Azure IoT Hub via it's AMQP 1.0 SSL TLS endpoints. Each IoT Hub devices has it's own security tokens and connection string provided by the IoT Hub service.

For our scenario we need to listen to all cloud-to-devices messages coming from the 10,000 IoT Hub device instances and "route" these to a central Service Bus topic to which the actual "gateways" in the field listen to. So basically we want to forward messages from 10,000 Service Bus Queues into one central Queue.

What is the best approach to handle these 10,000 AMQP listners from a SF Service? Is there a way we can reuse AMQP connections, sessions or links so we cache/share resources? And how can we dynamically spread the load of connection maintenance over the 5 nodes in the SF cluster?

We are evaluating these Nuget packages for the implementation: Microsoft.Azure.ServiceBus AMQPNetLite Microsoft.Azure.Devices.Client

We are doing some tests using the Microsoft.Azure.Devices.Client lib, see a simplified code sample below:

using System;
using System.Fabric;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Azure.Devices.Client;
using Microsoft.ServiceFabric.Services.Runtime;

namespace ID.Monitoring.MonServer.ServiceFabric.ServiceBus
{
    /// <summary>
    /// An instance of this class is created for each service instance by the Service Fabric runtime.
    /// </summary>
    internal sealed class ServiceBus : StatelessService
    {
        private readonly DeviceClient _deviceClient;
        private ConnectionStatus _status;

        public ServiceBus(StatelessServiceContext context)
            : base(context)
        {
            _deviceClient = DeviceClient.CreateFromConnectionString("HostName=id-monitoring-dev.azure-devices.net;DeviceId=100;SharedAccessSignature=SharedAccessSignature sr=id-monitoring-dev.azure-devices.net%2Fdevices%2F100&sig={token}&se=1553265888", TransportType.Amqp_Tcp_Only);
        }

        /// <summary>
        /// This is the main entry point for your service instance.
        /// </summary>
        /// <param name="cancellationToken">Canceled when Service Fabric needs to shut down this service instance.</param>
        protected override async Task RunAsync(CancellationToken cancellationToken)
        {
            _deviceClient.SetConnectionStatusChangesHandler(ConnectionStatusChangeHandler);

            while (true)
            {
                if (_status != ConnectionStatus.Connected)
                {
                    await _deviceClient.OpenAsync();
                }
                var receivedMessage = await _deviceClient.ReceiveAsync(TimeSpan.FromSeconds(10)).ConfigureAwait(false);

                if (receivedMessage != null)
                {
                    var messageData = Encoding.ASCII.GetString(receivedMessage.GetBytes());
                    //TODO: handle incoming message and publish to common 
                    await _deviceClient.CompleteAsync(receivedMessage).ConfigureAwait(false);
                }
            }
        }

        private void ConnectionStatusChangeHandler(ConnectionStatus status, ConnectionStatusChangeReason reason)
        {
            _status = status;
        }
    }
}

Question: Does this scale well to 10,000 Service Fabric service instances? Or are there more efficient ways to have this many AMQP Service Bus Listners maintained from a Service Fabric Service environment? Is there a way we can apply AMQP connection multiplexing maybe?

回答1:

Take a look at this.

The second answer provides a sample that allows you to multiplex multiple devices onto one Amqp connection.



回答2:

The approach you choose to monitor your devices won't scale well and will be hard to maintain.

Currently, service fabric has a limitation of how many instances you can place in a single node. For example: if you create an application with your ServiceBus service and span 10000 instances, you will hit this limitation, that is the number of nodes. i.e: if you have a 5 node cluster, you will be able to run only 5 instances of your service by using the default scaling approach.

To bypass this issue you have some options:

Partitioning:

To have a single stateless service running more partitions than the node count, you have to partition your service. Assuming you have a 5 node cluster and need 10000 instances, you will need 2000 partitions running on each node. If you use shared process and have enough ram to this, this approach might help you, please take a look at this thread and this thread before following this approach

Multiple Named Services:

Named service is the running service definition for one service type, in this case you would create one per device. like:

  • ServiceBusType
    • ServiceBus-Device1
    • ServiceBus-Device2
    • ServiceBus-Device3

This approach will consume too much resources in your machine, as you will be running one instance for each device, but easy to manage, as you can span new instances for each new device without affecting other running services.

Parallel Processing per instance:

Where each instance, would be responsible for processing multiple messages concurrently, in this case you would create 2000 connections for each instance(if running in a 5 instance/node per cluster). This will be lighter than the other approaches on resources consumption, but is a bit harder to maintain, as you will have to handle the balance yourself and might need an extra service to monitor and delegate tasks to all the services and ensure the messages are being processing evenly.

Summary:

One instance handling one connection at one message a time will required 10000 instances of your service, the partitioning will be similar but you can use a shared process to reduce memory consumption, but the memory consumption will still be high in both cases.

Multiple named services could be an option if the number of services were not too high, You also wouldn't be able to share the connection. So I won't recommend this approach for your scenario.

The third option, is the more resource friendly but you will have to find a way to partition the connections evenly throughout the cluster nodes.

You can also use a mixed approach, for example, you can have service handling multiple messages in parallel and a partitioned service to define the key range of devices.

Please take a look in the links I've mentioned.



回答3:

I found that there is a DeviceClient constructor that allows the AmqpConnectionPoolSettings to be set.