is there a way to specify the number of Threads that a KSQL query running on a KSQL Server should consume ? Is other words the parallelism of the query.
Is there any limit to the number of application that can be run on a KSQL Server ? When or how to decide to Scale out ?
Yes, you can specify ksql-streams-num-streams-threads
property. You can read more about it here.
Now, this is the number of KSQL Streams threads where stream processing occurs for that particular KSQL instance. It's important for vertical scaling because you might have enough computation resources in your machine to handle more threads and therefore you can do more work processing your streams on that specific machine.
If you have the capacity (i.e: CPU Cores), then you should have more threads so more Stream tasks can be scheduled on that instance and therefor having additional parallelization capacity on your KSQL Instance or Cluster (if you have more than one instance).
What you must understand with Kafka, Kafka Streams and KSQL is that horizontal scaling occurs with two main concepts:
- Kafka Streams applications (such as KSQL) can paralelize work based
on the number of kafka topic partitions. If you have 3 partitions
and you launch 4 KSQL Instances (i.e: on different servers), then one of them will not be doing work on a Stream you create on top of that topic. If you have the
same topic with 3 partitions and you have only 1 KSQL Server, he'll
be doing all of the work for the 3 partitions.
- When you add a new instance of your application Kafka Stream Application (in your case KSQL) and it joins your cluster processing your KSQL Streams and Tables, this specific instance will join the consumer groups consuming for
those topics and immediately start sharing the load with the other
instances as long as there are available partitions that other instances can offload (triggering a consumer group rebalance). The same happens if you take a instance down... the other instances will pick up the slack and start processing the partition(s) the retired instance was processing.
When comparing to vertical scaling (i.e: adding more capacity and threads to a KSQL instance), horizontal scaling does the same by adding the same computational resources to a different instance of the application on a different machine. You can understand the Kafka Stream Application Threading Model (with one or more application instances, on one or more machines) here:
I tried to simplify it, but you can read more of it on the KSQL Capacity Planning page and Confluent Kafka Streams Elastic Scale Blog Post
The important aspects of the scale-out / scale-in lifecycle of Kafka Streams (and KSQL) applications can be better understood like this:
1. A single instance working on 4 different partitions
2. Three instances working on 4 different partitions (one of them is
working on 2 different partitions)
3. An instances just left the group, now two instances are working on 4
different partitions, perfectly balanced (2 partitions for each)
(Images from confluent blog)