I'm using gRPC with Python as client/server inside kubernetes pods... I would like to be able to launch multiple pods of the same type (gRPC servers) and let the client connect to them (randomly).
I dispatched 10 pods of the server and setup a 'service' to target them. Then, in the client, I connected to the DNS name of the service - meaning kubernetes should do the load-balancing and direct me to a random server pod. In reality, the client calls the gRPC functions (which works well) but when I look at the logs I see that all calls going to the same server pod.
I presume the client is doing some kind of DNS caching which leads to all calls being sent to the same server. Is this the case? Is there anyway to disable it and set the same stub client to make a "new" call and fetch a new ip by DNS with each call?
I am aware of the overhead I might cause if it will query the DNS server each time but distributing the load is much more important for me at the moment.
If you've created a vanilla Kubernetes service, the service should have its own load-balanced virtual IP (check if
kubectl get svc your-service
shows aCLUSTER-IP
for your service). If this is the case, DNS caching should not be an issue, because that single virtual IP should be splitting traffic among the actual backends.Try
kubectl get endpoints your-service
to confirm that your service actually knows about all of your backends.If you have a headless service, a DNS lookup will return an A record with 10 IPs (one for each of your Pods). If your client is always choosing the first IP in an A record, that would also explain the behavior you're seeing.
Let me take the opportunity to answer by describing how things are supposed to work.
The way client-side LB works in the gRPC C core (the foundation for all but the Java and Go flavors or gRPC) is as follows (the authoritative doc can be found here):
Client-side LB is kept simple and "dumb" on purpose. The way we've chosen to implement complex LB policies is through an external LB server (as described in the aforementioned doc). You aren't concerned with this scenario. Instead, you are simply creating a channel, which will use the (default) pick-first LB policy.
The input to an LB policy is a list of resolved addresses. When using DNS, if foo.com resolves to
[10.0.0.1, 10.0.0.2, 10.0.0.3, 10.0.0.4]
, the policy will try to establish a connection to all of them. The first one to successfully connect will become the chosen one until it disconnects. Thus the name "pick-first". A longer name could have been "pick first and stick with it for as long as possible", but that made for a very long file name :). If/when the picked one gets disconnected, the pick-first policy will move over to returning the next successfully connected address (internally referred to as a "connected subchannel"), if any. Once again, it'll continue to choose this connected subchannel for as long as it stays connected. If all of them fail, the call would fail.The problem here is that DNS resolution, being intrinsically pull based, is only triggered 1) at channel creation and 2) upon disconnection of the chosen connected subchannel.
As of right now, a hacky solution would be to create a new channel for every request (very inefficient, but it'd do the trick given your setup).
Given changes coming in Q1 2017 (see https://github.com/grpc/grpc/issues/7818) will allow clients to choose a different LB policy, namely Round Robin. In addition, we may look into introducing a "randomize" bit to that client config, which would shuffle the addresses prior to doing Round-Robin over them, effectively achieving what you intend.
Usual K8S load balancing doesn't work for gRPC. The following link explains why. https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/
Most modern ingress controllers can handle this, but they are either hot of the oven (nginx), or in alpha version (traefik), or require the latest version of K8S (Linkerd). You can do client-side load balancing, of which you can find a Java solution here.