In Kafka, I would like to use only a single broker, single topic and a single partition having one producer and multiple consumers (each consumer getting its own copy of data from the broker). Given this, I do not want the overhead of using Zookeeper; Can I not just use the broker only? Why is a Zookeeper must?
相关问题
- Delete Messages from a Topic in Apache Kafka
- Serializing a serialized Thrift struct to Kafka in
- Kafka broker shutdown while cleaning up log files
- Getting : Error importing Spark Modules : No modul
- How to transform all timestamp fields when using K
相关文章
- Kafka doesn't delete old messages in topics
- Kafka + Spark Streaming: constant delay of 1 secon
- Spring Kafka Template implementaion example for se
- How to fetch recent messages from Kafka topic
- Determine the Kafka-Client compatibility with kafk
- Kafka to Google Cloud Platform Dataflow ingestion
- Kafka Producer Metrics
- Spark Structured Streaming + Kafka Integration: Mi
Important update - August 2019:
ZooKeeper dependency will be removed from Apache Kafka. See the high level discussion in KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum.
These efforts will take a few Kafka releases and additional KIPs. Kafka Controllers will take over the tasks of current ZooKeeper tasks. The Controllers will leverage the benefits of the Event Log which is a core concept of Kafka.
Some benefits of the new Kafka architecture are a simpler architecture, ease of operations and better scalability (e.g. allow "unlimited partitions".
Other than the usual payload message transfer, there are many other communications that happens in kafka. like * Events related to brokers requesting the cluster membership * Events related to Brokers becoming available * Getting bootstrap config setups. * Events related to controller and leader updates. * Help status updates like Heartbeat updates.
Zookeeper itself is a distributed system consisting of multiple nodes in an ensemble. Zookeeper is centralised service for maintaining such metadata.
Zookeeper is centralizing and management system for any kind of distributed systems. Distributed system is different software modules running on different nodes/clusters (might be on geographically distant locations) but running as one system. Zookeeper facilitates communication between the nodes, sharing configurations among the nodes, it keeps track of which node is leader, which node joins/leaves, etc. Zookeeper is the one who keeps distributed systems sane and maintains consistency. Zookeeper basically is an orchestration platform.
Kafka is a distributed system. And hence it needs some kind of orchestration for its nodes that might be geographically distant (or not).
Yes, Zookeeper is must by design for Kafka. Because Zookeeper has the responsibility a kind of managing Kafka cluster. It has list of all Kafka brokers with it. It notifies Kafka, if any broker goes down, or partition goes down or new broker is up or partition is up. In short ZK keeps every Kafka broker updated about current state of the Kafka cluster.
Then every Kafka client(producer/consumer) all need to do is connect with any single broker and that broker has all metadata updated by Zookeeper, so client need not to bother about broker discovery headache.
Updated on Feb 2020
For the latest version (2.4.0) ZooKeeper is still required for running Kafka, but in the near future ZooKeeper will be replaced with a Self-Managed Metadata Quorum.
See details in the accepted KIP-500.
Kafka is built to use Zookeeper. There is no escaping from that.
Kafka is a distributed system and uses Zookeeper to track status of kafka cluster nodes. It also keeps track of Kafka topics, partitions etc.
Looking at your question, it seems you do not need Kafka. You can use any application that supports pub-sub such as Redis, Rabbit MQ or hosted solutions such as Pub-nub.