Why is prometheus operator not able to start

2019-04-17 15:06发布

问题:

I'm trying to create prometheus with operator in fresh new k8s cluster I use the following files ,

  1. I’m creating a namespace monitoring
  2. Apply this file , which works ok

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  labels:
    k8s-app: prometheus-operator
  name: prometheus-operator
  namespace: monitoring
spec:
  replicas: 2
  selector:
    matchLabels:
      k8s-app: prometheus-operator
  template:
    metadata:
      labels:
        k8s-app: prometheus-operator
    spec:
      priorityClassName: "operator-critical"
      tolerations:
      - key: "WorkGroup"
        operator: "Equal"
        value: "operator"
        effect: "NoSchedule"
      - key: "WorkGroup"
        operator: "Equal"
        value: "operator"
        effect: "NoExecute"
      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --logtostderr=true
        - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
        - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.29.0
        image: quay.io/coreos/prometheus-operator:v0.29.0
        name: prometheus-operator
        ports:
        - containerPort: 8080
          name: http
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
      nodeSelector:
      serviceAccountName: prometheus-operator

Now I want to apply this file (CRD)

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  namespace: monitoring
  labels: 
    prometheus: prometheus
spec:
  replica: 1
  priorityClassName: "operator-critical"
  serviceAccountName: prometheus
  nodeSelector:
        worker.garden.sapcloud.io/group: operator
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      role: observeable
  tolerations:
  - key: "WorkGroup"
    operator: "Equal"
    value: "operator"
    effect: "NoSchedule"
  - key: "WorkGroup"
    operator: "Equal"
    value: "operator"
    effect: "NoExecute"

before I've created those CRD

https://github.com/coreos/prometheus-operator/tree/master/example/prometheus-operator-crd

The problem that the pods didn't able to start (0/2), see the picture below. What could be the problem? please advice

update

when I go to the event of the prom operator I see the following Error creating: pods "prometheus-operator-6944778645-" is forbidden: no PriorityClass with name operator-critical was found replicaset-controller , any idea ?

回答1:

You are trying to reference the operator-critical priority class. Priority classes determine the priority of pods and their resource assignment.

To fix this issue you could either remove the explicit priority class(priorityClassName: "operator-critical") in both files or create the operator-critical class:

apiVersion: scheduling.k8s.io/v1beta1
kind: PriorityClass
metadata:
  name: operator-critical
value: 1000000
globalDefault: false
description: "Critical operator workloads"


回答2:

Prometheus and alert manager pods need persistent volume to store the data. Make sure those pv's are present and are bound to the respective pods. Alternatively you can make those pods ephemeral. It should work