My understanding as per Kafka stream documentation,
Maximum possible parallel tasks is equal to maximum number of partitions of a topic among all topics in a cluster.
I have around 60 topics at Kafka cluster. Each topic has single partition only.
Is it possible to achieve scalability/parallelism with Kafka stream for my Kafka cluster?
Do you want to do the same computation over all topics? For this, I would recommend to introduce an extra topic with many partitions that you use to scale out:
// using new 1.0 API
StreamsBuilder builder = new StreamsBuilder():
KStream parallelizedStream = builder
.stream(/* subscribe to all topics at once*/)
.through("topic-with-many-partitions");
// apply computation
parallelizedStream...
Note: You need to create the topic "topic-with-many-partitions" manually before starting your Streams application
Pro Tip:
The topic "topic-with-many-partitions" can have a very short retention time as it's only used for scaling and must not hold data long term.
Update
If you have 10 topic T1 to T10 with a single partitions each, the program from above will execute as follows (with TN being the dummy topic with 10 partitions):
T1-0 --+ +--> TN-0 --> T1_1
... --+--> T0_0 --+--> ... --> ...
T10-0 --+ +--> TN-10 --> T1_10
The first part of your program will only read all 10 input topics and write it back into 10 partitions of TN. Afterwards, you can get up to 10 parallel tasks, each processing one input partition. If you start 10 KafakStreams
instances, only one will execute T0_0, and each will alsa one T1_x running.