Can I have 100s of thousands of topics in a Kafka

I have a data flow use case where I want to have topics defined based on each of the customer repositories (which might be in the order of 100,000s) Each data flow would be a topic with partitions (in the order of a few 10s) defining the different stages of the flow.

Is Kafka good for a scenario like this? If not how would I remodel my use case to handle such scenarios. Also it is the case that each customer repository data cannot be mingled with others even during processing.

标签： apache-kafka

1条回答

走好不送

2楼-- · 2020-02-23 09:15

Update Sep 2018: Today, as of Kafka v2.0, a Kafka cluster can have hundreds of thousands of topics. See https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions.

Initial answer below for posterity:

The rule of thumb is that the number of Kafka topics can be in the thousands.

Jun Rao (Kafka committer; now at Confluent but he was formerly in LinkedIn's Kafka team) wrote:

At LinkedIn, our largest cluster has more than 2K topics. 5K topics should be fine. [...]

With more topics, you may hit one of those limits: (1) # dirs allowed in a FS; (2) open file handlers (we keep all log segments open in the broker); (3) ZK nodes.

The Kafka FAQ gives the following abstract guideline:

Kafka FAQ: How many topics can I have?

Unlike many messaging systems Kafka topics are meant to scale up arbitrarily. Hence we encourage fewer large topics rather than many small topics. So for example if we were storing notifications for users we would encourage a design with a single notifications topic partitioned by user id rather than a separate topic per user.

The actual scalability is for the most part determined by the number of total partitions across all topics not the number of topics itself (see the question below for details).

The article http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ (written by the aforementioned Jun Rao) adds further details, and particularly focuses on the impact of the number of partitions.

IMHO your use case / model is a bit of a stretch for a single Kafka cluster, though not necessarily for Kafka in general. With the little information you shared (I understand that a public forum is not the best place for sensitive discussions :-P) the only off-the-hip comment I can provide you with is to consider using more than one Kafka cluster because you mentioned that customer data must be very much isolated anyways (including the processing steps).

I hope this helps a bit!

0人赞添加讨论(0) 举报

Can I have 100s of thousands of topics in a Kafka

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间