Should event hubs be split on message type?

2019-04-08 12:24发布

问题:

I am considering using Azure event Hub for a project I am currently working on. We are using Service Bus Queues for commands today and here we are using one queue per messagetype.

Would it make sense to have several Event Hubs or is it better to use one hub for several message types?

回答1:

This is a question full of tradeoffs and exercising judgement about what systems you expect to build now and in the future and how they might use the different event types.

Below is an excerpt from some of the guidance Jay Kreps has given for designing systems on top of Apache Kafka which applies well to Event Hubs as well (with the major exception of the limitations imposed by short retention periods and limitations on number of consumer groups).

Let’s begin with pure event data—the activities taking place inside the company. In a web company these might be clicks, impression, and various user actions. FedEx might have package deliveries, package pick ups, driver positions, notifications, transfers and so on.

These type of events can be represented with a single logical stream per action type. For simplicity I recommend naming the Avro schema and the topic the same thing, e.g. PageViewEvent. If the event has a natural primary key you can use that to partition data in Kafka, otherwise the Kafka client will automatically load balance data for you.

...

We experimented at various times with mixing multiple events in a single topic and found this generally lead to undue complexity. Instead give each event it’s own topic and consumers can always subscribe to multiple such topics to get a mixed feed when they want that.

I generally agree with this advice (and you should read that entire blog post if you're designing a system on Event Hubs/Kafka/Kinesis). Subscribers needing to ignore messages they aren't interested in is not only annoying, it becomes problematic later if one of the event types starts to dominate the combined stream.

But having multiple streams and combining them together does have costs, and they need to be weighed in making a decision. I've listed some that come to mind.

  1. You lose ordering between events of different type from the same source unless you spend the effort to add it back.

  2. If you want to commit progress to the different topics together then you need to manage them.

  3. If you are partitioning the event streams on a primary key shared between the topics and want the partitions in each topic to travel together, you can't use the high level clients like EventProcessorHost as partitions can end up autobalanced to different processes.

  4. A consumer with one thread per partition ends up multiplying the needed number of threads by the number of topics. Probably not an issue unless you have expensive structures that can't be shared.

In my own deployment we use different event hubs for different event types even though we currently use the same code to process them all. This is simply because I expect to add new components that only care about certain event types. I hope this helps, and at worst I've told you to go look at the guidance for Kafka since the principle's the same and it's been around longer.