GKE does not scale to/from 0 when autoscaling enab

2020-08-22 03:07发布

问题:

I want to run a CronJob on my GKE in order to perform a batch operation on a daily basis. The ideal scenario would be for my cluster to scale to 0 nodes when the job is not running and to dynamically scale to 1 node and run the job on it every time the schedule is met.

I am first trying to achieve this by using a simple CronJob found in the kubernetes doc that only prints the current time and terminates.

I first created a cluster with the following command:

gcloud container clusters create $CLUSTER_NAME \
    --enable-autoscaling \
    --min-nodes 0 --max-nodes 1 --num-nodes 1 \
    --zone $CLUSTER_ZONE

Then, I created a CronJob with the following description:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: Never

The job is scheduled to run every hour and to print the current time before terminating.

First thing, I wanted to create the cluster with 0 nodes but setting --num-nodes 0 results in an error. Why is it so? Note that I can manually scale down the cluster to 0 nodes after it has been created.

Second, if my cluster has 0 nodes, the job won't be scheduled because the cluster does not scale to 1 node automatically but instead gives the following error:

Cannot schedule pods: no nodes available to schedule pods.

Third, if my cluster has 1 node, the job runs normally but after that, the cluster won't scale down to 0 nodes but stay with 1 node instead. I let my cluster run for two successive jobs and it did not scale down in between. I assume one hour should be long enough for the cluster to do so.

What am I missing?

EDIT: I've got it to work and detailed my solution here.

回答1:

Update:

Note: Beginning with Kubernetes version 1.7, you can specify a minimum size of zero for your node pool. This allows your node pool to scale down completely if the instances within aren't required to run your workloads.

https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler


Old answer:

Scaling the entire cluster to 0 is not supported, because you always need at least one node for system pods:

See docs

You could create one node pool with a small machine for system pods, and an additional node pool with a big machine where you would run your workload. This way the second node pool can scale down to 0 and you still have space to run the system pods.

After attempting, @xEc mentions: Also note that there are scenarios in which my node pool wouldn't scale, like if I created the pool with an initial size of 0 instead of 1.

Initial suggestion:

Perhaps you could run a micro VM, with cron to scale the cluster up, submit a Job (instead of CronJob), wait for it to finish and then scale it back down to 0?



回答2:

I do not think it's a good idea to tweak GKE for this kind of job. If you really need 0 instances I'd suggest you use either

  1. App Engine Standard Environment, which allows you scale Instances to 0 (https://cloud.google.com/appengine/docs/standard/go/config/appref) or
  2. Cloud Functions, they are 'instanceless'/serverless anyway. You can use this unofficial guide to trigger your Cloud Functions (https://cloud.google.com/community/tutorials/using-stackdriver-uptime-checks-for-scheduling-cloud-functions)