I recently started learning Kafka and end up with these questions.
What is the difference between Consumer and Stream? For me, if any tool/application consume messages from Kafka is a consumer in the Kafka world.
How Stream is different as this also consumes from or produce messages to Kafka? and why is it needed as we can write our own consumer application using Consumer API and process them as needed or send them to Spark from the consumer application?
I did Google on this, but did not get any good answers for this. Sorry if this question is too trivial.
Update April 09, 2018: Nowadays you can also use KSQL, the streaming SQL engine for Kafka, to process your data in Kafka. KSQL is built on top of Kafka's Streams API, and it too comes with first-class support for "streams" and "tables". Think of it like the SQL brother of Kafka Streams where you don't have to write any programming code in Java or Scala.
Kafka's Streams API (https://kafka.apache.org/documentation/streams/) is built on top of Kafka's producer and consumer clients. It's significantly more powerful and also more expressive than the Kafka consumer client. Here are some of the features of the Kafka Streams API:
map
,filter
,reduce
as well as (2) an imperative style Processor API for e.g. doing complex event processing (CEP), and (3) you can even combine the DSL and the Processor API.See http://docs.confluent.io/current/streams/introduction.html for a more detailed but still high-level introduction to the Kafka Streams API, which should also help you to understand the differences to the lower-level Kafka consumer client. There's also a Docker-based tutorial for the Kafka Streams API, which I blogged about earlier this week.
Yes, the Kafka Streams API can both read data as well as write data to Kafka.
Yes, you could write your own consumer application -- as I mentioned, the Kafka Streams API uses the Kafka consumer client (plus the producer client) itself -- but you'd have to manually implement all the unique features that the Streams API provides. See the list above for everything you get "for free". It is thus rather a rare circumstance that a user would pick the low-level consumer client rather than the more powerful Kafka Streams API.