in the context of kubernetes or else, does it make sense to have one KSQL SERVER per application? When i read the capacity planning for KSQL Server, it is seems the basic settings are for running multiple queries on one server.
However I feel like to have a better control over scaling up and down with Kubernetes, it would make more sense to fix the number of Thread by per query, and launch a server configured in kube with let say 1 cpu, where only one application would run. However i am not sure how heavy are KSQL Server, and if that make actual sense or not.
Any recommendation.
First of all, what you have mentioned is clearly doable. You can run KSQL Server with Docker, so it's you could have a container orchestrator such as kubernetes or swarm maintaining and scheduling those KSQL Server instances.
So you know how this would play out:
- Each KSQL Instance will join a group of other KSQL Instances with
the same
KSQL_SERVICE_ID
that use the same Kafka Cluster defined by KSQL_KSQL_STREAMS_BOOTSTRAP_SERVERS
- You can create several KSQL Server Clusters, i.e for different
applications, just use different
KSQL_SERVICE_ID
while using the
same Kafka Cluster.
As a result, you now you have:
- Multiple Containerized KSQL Server Instances managed by a container
orchestrator such as Kubernetes.
- All of the KSQL Instances are connected to the Same Kafka Cluster (you can also have different Kafka Clusters for different
KSQL_SERVICE_ID
)
- The KSQL Server Instances can be grouped in different applications
(different
KSQL_SERVICE_ID
) in order to achieve separation of
concerns so that scalability, security and availability can be
better maintained.
Regarding the coexistence of several KSQL Server Instances (maybe with different KSQL_SERVICE_ID
) on the same server, you should know the available machine resources can be monopolized by a greedy instance, causing problems to the less greedy instance. With Kubernetes you could set resource limits on your Pods to avoid this, but greedy instances will be limited and slowed down.
Confluent advice regarding multi-tenancy:
We recommend against using KSQL in a multi-tenant fashion. For
example, if you have two KSQL applications running on the same node,
and one is greedy, you're likely to encounter resource issues related
to multi-tenancy. We recommend using a single pool of KSQL Server
instances per use case. You should deploy separate applications onto
separate KSQL nodes, because it becomes easier to reason about scaling
and resource utilization. Also, deploying per use case makes it easier
to reason about failovers and replication.
A possible drawback is the overhead you'll have if you run multiple KSQL Server Instances (Java Application footprint) in the same pool while having no work for them to do (i.e: no schedulable tasks due to lack of partitions on your topic(s)) or simply because you have very little workload. You might be doing the same job with less instances, avoiding idled or nearly-idled instances.
Of course stuffing all stream processing, maybe for completely different use cases or projects, on a single KSQL Server or pool of KSQL Servers may bring its own internal concurrency issues, development cycle complexities, management, etc..
I guess something in the middle will work fine. Use a pool of KSQL Server instances for a single project or use case, which in turn might translate to a pipeline consisting on a topology of several source, process and sinks, implemented by a number of KSQL queries.
Also, don't forget about the scaling mechanisms of Kafka, Kafka Streams and KSQL (built on top of Kafka Streams) discussed in the previous question you've posted.
All of this mechanisms can be found here:
https://docs.confluent.io/current/ksql/docs/capacity-planning.html
https://docs.confluent.io/current/ksql/docs/concepts/ksql-architecture.html
https://docs.confluent.io/current/ksql/docs/installation/install-ksql-with-docker.html