We have a Kubernetes cluster of web scraping cron jobs set up. All seems to go well until a cron job starts to fail (e.g., when a site structure changes and our scraper no longer works). It looks like every now and then a few failing cron jobs will continue to retry to the point it brings down our cluster. Running kubectl get cronjobs
(prior to a cluster failure) will show too many jobs running for a failing job.
I've attempted following the note described here regarding a known issue with the pod backoff failure policy; however, that does not seem to work.
Here is our config for reference:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: scrape-al
spec:
schedule: '*/15 * * * *'
concurrencyPolicy: Allow
failedJobsHistoryLimit: 0
successfulJobsHistoryLimit: 0
jobTemplate:
metadata:
labels:
app: scrape
scrape: al
spec:
template:
spec:
containers:
- name: scrape-al
image: 'govhawk/openstates:1.3.1-beta'
command:
- /opt/openstates/openstates/pupa-scrape.sh
args:
- al bills --scrape
restartPolicy: Never
backoffLimit: 3
Ideally we would prefer that a cron job would be terminated after N retries (e.g., something like kubectl delete cronjob my-cron-job
after my-cron-job
has failed 5 times). Any ideas or suggestions would be much appreciated. Thanks!
You can tell your Job to stop retrying using
backoffLimit
.In your case
You set 3 as
backoffLimit
of your Job. That means when a Job is created by CronJob, It will retry 3 times if fails. This controls Job, not CronJobWhen Job is failed, another Job will be created again as your scheduled period.
You want: If I am not wrong, you want to stop scheduling new Job, when your scheduled Jobs are failed for 5 times. Right?
Answer: In that case, this is not possible automatically.
Possible solution: You need to suspend CronJob so than it stop scheduling new Job.
You can do this manually. If you do not want to do this manually, you need to setup a watcher, that will watch your CronJob status, and will update CronJob to suspend if necessary.