Is Zookeeper a must for Kafka?

2020-02-07 14:35发布

In Kafka, I would like to use only a single broker, single topic and a single partition having one producer and multiple consumers (each consumer getting its own copy of data from the broker). Given this, I do not want the overhead of using Zookeeper; Can I not just use the broker only? Why is a Zookeeper must?

9条回答
Ridiculous、
2楼-- · 2020-02-07 14:57

Yes, Zookeeper is required for running Kafka. From the Kafka Getting Started documentation:

Step 2: Start the server

Kafka uses zookeeper so you need to first start a zookeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node zookeeper instance.

As to why, well people long ago discovered that you need to have some way to coordinating tasks, state management, configuration, etc across a distributed system. Some projects have built their own mechanisms (think of the configuration server in a MongoDB sharded cluster, or a Master node in an Elasticsearch cluster). Others have chosen to take advantage of Zookeeper as a general purpose distributed process coordination system. So Kafka, Storm, HBase, SolrCloud to just name a few all use Zookeeper to help manage and coordinate.

Kafka is a distributed system and is built to use Zookeeper. The fact that you are not using any of the distributed features of Kafka does not change how it was built. In any event there should not be much overhead from using Zookeeper. A bigger question is why you would use this particular design pattern -- a single broker implementation of Kafka misses out on all of the reliability features of a multi-broker cluster along with it's ability to scale.

查看更多
Melony?
3楼-- · 2020-02-07 14:57

As explained by others, Kafka (even in most recent version) will not work without Zookeeper.

Kafka uses Zookeeper for the following:

Electing a controller. The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions. When a node shuts down, it is the controller that tells other replicas to become partition leaders to replace the partition leaders on the node that is going away. Zookeeper is used to elect a controller, make sure there is only one and elect a new one it if it crashes.

Cluster membership - which brokers are alive and part of the cluster? this is also managed through ZooKeeper.

Topic configuration - which topics exist, how many partitions each has, where are the replicas, who is the preferred leader, what configuration overrides are set for each topic

(0.9.0) - Quotas - how much data is each client allowed to read and write

(0.9.0) - ACLs - who is allowed to read and write to which topic (old high level consumer) - Which consumer groups exist, who are their members and what is the latest offset each group got from each partition.

[from https://www.quora.com/What-is-the-actual-role-of-ZooKeeper-in-Kafka/answer/Gwen-Shapira]

Regarding your scenario, only one broker instance and one producer with multiple consumer, u can use pusher to create a channel, and push event to that channel that consumer can subscribe to and hand those events. https://pusher.com/

查看更多
地球回转人心会变
4楼-- · 2020-02-07 14:58

IMHO Zookeeper is not an overhead but makes your life a lot easier.

It is basically used to maintain co-ordination between different nodes in a cluster. One of the most important things for Kafka is it uses zookeeper to periodically commit offsets so that in case of node failure it can resume from the previously committed offset (imagine yourself taking care of all this by your own).

Zookeeper also plays a vital role for serving many other purposes, such as leader detection, configuration management, synchronization, detecting when a new node joins or leaves the cluster, etc.

Future Kafka releases are planning to remove the zookeeper dependency but as of now it is an integral part of it.

Here are a few lines taken from their FAQ page:

Once the Zookeeper quorum is down, brokers could result in a bad state and could not normally serve client requests, etc. Although when Zookeeper quorum recovers, the Kafka brokers should be able to resume to normal state automatically, there are still a few corner cases the they cannot and a hard kill-and-recovery is required to bring it back to normal. Hence it is recommended to closely monitor your zookeeper cluster and provision it so that it is performant.

For more details check here

查看更多
登录 后发表回答