Coming from numerous years of running node/rails apps on bare metal; i was used to be able to run as many apps as i wanted on a single machine (let's say, a 2Go at digital ocean could easily handle 10 apps without worrying, based on correct optimizations or fairly low amount of traffic)
Thing is, using kubernetes, the game sounds quite different. I've setup a "getting started" cluster with 2 standard vm (3.75Go).
Assigned a limit on a deployment with the following :
resources:
requests:
cpu: "64m"
memory: "128Mi"
limits:
cpu: "128m"
memory: "256Mi"
Then witnessing the following :
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default api 64m (6%) 128m (12%) 128Mi (3%) 256Mi (6%)
What does this 6% refers to ?
Tried to lower the CPU limit, to like, 20Mi… the app does to start (obviously, not enough resources). The docs says it is percentage of CPU. So, 20% of 3.75Go machine ? Then where this 6% comes from ?
Then increased the size of the node-pool to the n1-standard-2, the same pod effectively span 3% of node. That sounds logical, but what does it actually refers to ?
Still wonder what is the metrics to be taken in account for this part.
The app seems to need a large amount of memory on startup, but then it use only a minimal fraction of this 6%. I then feel like I misunderstanding something, or misusing it all
Thanks for any experienced tips/advices to have a better understanding Best
According to the docs, CPU requests (and limits) are always fractions of available CPU cores on the node that the pod is scheduled on (with a
resources.requests.cpu
of"1"
meaning reserving one CPU core exclusively for one pod). Fractions are allowed, so a CPU request of"0.5"
will reserve half a CPU for one pod.For convenience, Kubernetes allows you to specify CPU resource requests/limits in millicores:
As already mentioned in the other answer, resource requests are guaranteed. This means that Kubernetes will schedule pods in a way that the sum of all requests will not exceed the amount of resources actually available on a node.
So, by requesting
64m
of CPU time in your deployment, you are requesting actually 64/1000 = 0,064 = 6,4% of one of the node's CPU cores time. So that's where your 6% come from. When upgrading to a VM with more CPU cores, the amount of available CPU resources increases, so on a machine with two available CPU cores, a request for 6,4% of one CPU's time will allocate 3,2% of the CPU time of two CPUs.The 6% of CPU means 6% (CPU requests) of the nodes CPU time is reserved for this pod. So it guaranteed that it always get at lease this amount of CPU time. It can still burst up to 12% (CPU limits), if there is still CPU time left.
This means if the limit is very low, your application will take more time to start up. Therefore a liveness probe may kill the pod before it is ready, because the application took too long. To solve this you may have to increase the
initialDelaySeconds
or thetimeoutSeconds
of the liveness probe.Also note that the resource requests and limits define how many resources your pod allocates, and not the actual usage.
Therefore the percentages tell you how much CPU and memory of the total resources your pod allocates.
Link to the docs: https://kubernetes.io/docs/user-guide/compute-resources/
Some other notable things: