SignalR scaling out with Azure EventHub

2019-02-13 05:23发布

问题:

I am looking high frequency scaling solution for SignalR. I am wondering if I can do it with Azure EventHub. If I use EventHub as my backplane for SignalR messages, will it become a bottleneck for me?

I've checked this page but there is nothing about EventHub as it is fairly new.

回答1:

I can't speak to the precise specifics of SignalR; however, you could in principle use EventHubs for a backplane, but you need to be aware of the limitations.

SignalR's backplane scaleout pattern assumes that all the servers will have access to all the messages and presumptively process them all. This provides a fairly clear limit on what a single backplane can do on commodity hardware or in the cloud. In a typical cloud you might be able to sustain 100MB/s data throughput (nice round number for a 1 Gb/s nic), upper end of commodity hardware (and Azure's HPC machines) 1000MB/s (10 Gbit/second nic).

So the question is then can Azure EventHubs take you to this architectural limitation on throughput?

The answer to that is simply yes. 100 or a 1000 partitions will give you sufficient write throughput, and sufficient read capacity for 2 servers.

The next question is, if you only need 100MB/second read in your backplane per server how many servers can read the data (ie if you're broadcasting 100MB/second of stock ticks where the data size doesn't increase with number of servers).

The answer here is, as many as you want but there are some tricks.

EventHubs scale by partitioning the data stream. Each partitions each of which will have a maximum read throughput of 2MB/s which is shared across all the readers. However, you can just multiply the number of partitions to make up for the split (adding more than 32 requires talking to Microsoft). The design assumption of EventHubs (like Kafka and Kinesis) is that consumption will be split across machines thereby avoiding the backplane limitation discussed earlier. Consumers that are working together to read the stream are a Consumer Group (Azure appears to require a named CG even for a direct reader), in this backplane model there are not logical consumer groups, so the question is how to read the data.

The simplest solution is likely to use the high level autobalancing Event Processor Host with each server being its own Consumer Group with a fixed name. With only one server in each consumer group each server will receive all the partitions (500 for 10 servers to hit 100MB/second, aka $11k/month + $0.028 per million events).

This approach has one key limitation: you are limited to 20 consumer groups per event hub. So you can chain Event Hubs together or make a tree with this approach to get arbitrary numbers.

The other option is to use direct clients which connect to specific partitions. A single partition in a consumer group can have 5 readers thereby reducing the need for chaining hubs by a factor of 5 thereby cutting the per event cost by a factor of 5 (doesn't reduce the throughput unit requirements).

In summary, it shouldn't become a bottle neck before any backplane would become a bottleneck. But don't build something on a backplane if you expect it to go beyond 100MB/second in traffic.

I didn't speak about latency, you'll need to test that yourself, but chances are you're not doing HFT in the cloud and there's a reason realtime games are typically in instances.