cannot fetch token error when using cloudsql-proxy

2019-05-06 04:37发布

问题:

I am using GKE with istio add-on enabled. Myapp somehow gives 503 errors using when using websocket. I am starting to think that maybe the websocket is working but the database connection is not and that causes 503's, as the cloudsql-proxy logs give errors:

$ kubectl logs myapp-54d6696fb4-bmp5m cloudsql-proxy
2019/01/04 21:56:47 using credential file for authentication; email=proxy-user@myproject.iam.gserviceaccount.com
2019/01/04 21:56:47 Listening on 127.0.0.1:5432 for myproject:europe-west4:mydatabase
2019/01/04 21:56:47 Ready for new connections
2019/01/04 21:56:51 New connection for "myproject:europe-west4:mydatabase"
2019/01/04 21:56:51 couldn't connect to "myproject:europe-west4:mydatabase": Post https://www.googleapis.com/sql/v1beta4/projects/myproject/instances/mydatabase/createEphemeral?alt=json: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: read tcp 10.44.11.21:60728->108.177.126.95:443: read: connection reset by peer
2019/01/04 22:14:56 New connection for "myproject:europe-west4:mydatabase"
2019/01/04 22:14:56 couldn't connect to "myproject:europe-west4:mydatabase": Post https://www.googleapis.com/sql/v1beta4/projects/myproject/instances/mydatabase/createEphemeral?alt=json: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: read tcp 10.44.11.21:36734->108.177.127.95:443: read: connection reset by peer

Looks like the required authentication details should be in the credentials of the proxy service account I created and thus is provided for:

{
  "type": "service_account",
  "project_id": "myproject",
  "private_key_id": "myprivekeyid",
  "private_key": "-----BEGIN PRIVATE KEY-----\MYPRIVATEKEY-----END PRIVATE KEY-----\n",
  "client_email": "proxy-user@myproject.iam.gserviceaccount.com",
  "client_id": "myclientid",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/proxy-user%40myproject.iam.gserviceaccount.com"
}

My question: How do I get rid of the errors/ get a proper google sql config from GKE?

At cluster creation I selected the mTLS 'permissive' option.

My config: myapp_and_router.yaml:

apiVersion: v1
kind: Service
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  ports:
  - port: 8089
    # 'name: http' apparently does not work
    name: db
  selector:
    app: myapp    
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: gcr.io/myproject/firstapp:v1
          imagePullPolicy: Always
          ports:
            - containerPort: 8089
          env:
            - name: POSTGRES_DB_HOST
              value: 127.0.0.1:5432
            - name: POSTGRES_DB_USER
              valueFrom:
                secretKeyRef:
                  name: mysecret
                  key: username
            - name: POSTGRES_DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysecret
                  key: password
          ## Custom healthcheck for Ingress
          readinessProbe:
            httpGet:
              path: /healthz
              scheme: HTTP
              port: 8089
            initialDelaySeconds: 5
            timeoutSeconds: 5
          livenessProbe:
            httpGet:
              path: /healthz
              scheme: HTTP
              port: 8089
            initialDelaySeconds: 5
            timeoutSeconds: 20             
        - name: cloudsql-proxy
          image: gcr.io/cloudsql-docker/gce-proxy:1.11
          command: ["/cloud_sql_proxy",
                    "-instances=myproject:europe-west4:mydatabase=tcp:5432",
                    "-credential_file=/secrets/cloudsql/credentials.json"]
          securityContext:
            runAsUser: 2
            allowPrivilegeEscalation: false
          volumeMounts:
            - name: cloudsql-instance-credentials
              mountPath: /secrets/cloudsql
              readOnly: true
      volumes:
        - name: cloudsql-instance-credentials
          secret:
            secretName: cloudsql-instance-credentials
---
###########################################################################
# Ingress resource (gateway)
##########################################################################
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: myapp-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      # 'name: http' apparently does not work
      name: db 
      protocol: HTTP
    hosts:
    - "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
  - "*"
  gateways:
  - myapp-gateway
  http:
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: myapp
      weight: 100
    websocketUpgrade: true
---

EDIT 1: I had not enabled permissions (scopes) for the various google services when creating the cluster, see here. After creating a new cluster with the permissions I now get a new errormessage:

kubectl logs mypod cloudsql-proxy
2019/01/11 20:39:58 using credential file for authentication; email=proxy-user@myproject.iam.gserviceaccount.com
2019/01/11 20:39:58 Listening on 127.0.0.1:5432 for myproject:europe-west4:mydatabase
2019/01/11 20:39:58 Ready for new connections
2019/01/11 20:40:12 New connection for "myproject:europe-west4:mydatabase"
2019/01/11 20:40:12 couldn't connect to "myproject:europe-west4:mydatabase": Post https://www.googleapis.com/sql/v1beta4/projects/myproject/instances/mydatabase/createEphemeral?alt=json: oauth2: cannot fetch token: 400 Bad Request
Response: {
  "error": "invalid_grant",
  "error_description": "Invalid JWT Signature." 
}

EDIT 2: Looks like new error was caused by the Service Accounts keys no longer being valid. After making new ones I can connect to the database!

回答1:

I saw similar errors but was able to get cloudsql-proxy working in my istio cluster on GKE by creating the following service entries (with some help from https://github.com/istio/istio/issues/6593#issuecomment-420591213):

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: google-apis
spec:
  hosts:
  - "*.googleapis.com"
  ports:
  - name: https
    number: 443
    protocol: HTTPS
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: cloudsql-instances
spec:
  hosts:
  # Use `gcloud sql instances list` to get the addresses of instances
  - 35.226.125.82
  ports:
  - name: tcp
    number: 3307
    protocol: TCP

Also, I still saw those connection errors during initialization until I added a delay in my app startup (sleep 10 before running server) to give the istio-proxy and cloudsql-proxy containers time to get set up first.

EDIT 1: Here are logs with the errors, then the successful "New connection/Client closed" lines once things are working:

2019/01/10 21:54:38 New connection for "my-project:us-central1:my-db"
2019/01/10 21:54:38 Throttling refreshCfg(my-project:us-central1:my-db): it was only called 44.445553175s ago
2019/01/10 21:54:38 couldn't connect to "my-project:us-central1:my-db": Post https://www.googleapis.com/sql/v1beta4/projects/my-project/instances/my-db/createEphemeral?alt=json: oauth2: cannot fetch token: Post https://accounts.google.com/o/oauth2/token: dial tcp 108.177.112.84:443: getsockopt: connection refused
2019/01/10 21:54:38 New connection for "my-project:us-central1:my-db"
2019/01/10 21:54:38 Throttling refreshCfg(my-project:us-central1:my-db): it was only called 44.574562959s ago
2019/01/10 21:54:38 couldn't connect to "my-project:us-central1:my-db": Post https://www.googleapis.com/sql/v1beta4/projects/my-project/instances/my-db/createEphemeral?alt=json: oauth2: cannot fetch token: Post https://accounts.google.com/o/oauth2/token: dial tcp 108.177.112.84:443: getsockopt: connection refused
2019/01/10 21:55:15 New connection for "my-project:us-central1:my-db"
2019/01/10 21:55:16 Client closed local connection on 127.0.0.1:5432
2019/01/10 21:55:17 New connection for "my-project:us-central1:my-db"
2019/01/10 21:55:17 New connection for "my-project:us-central1:my-db"
2019/01/10 21:55:27 Client closed local connection on 127.0.0.1:5432
2019/01/10 21:55:28 New connection for "my-project:us-central1:my-db"
2019/01/10 21:55:30 Client closed local connection on 127.0.0.1:5432
2019/01/10 21:55:37 Client closed local connection on 127.0.0.1:5432
2019/01/10 21:55:38 New connection for "my-project:us-central1:my-db"
2019/01/10 21:55:40 Client closed local connection on 127.0.0.1:5432

EDIT 2: Ensure that Cloud SQL api is within scope of your cluster.



回答2:

Since MySQL and PostgreSQL are based on the TCP/IP protocol(or unix socket on a specific situation) and Postgres isn't using HTTP, the problem comes from the Service's port name,.

First, try to change the port name, you can change it to "db" as example. Another workaround is to use jdbc Socket Factory connecting to CloudSQL Mysql with slight difference :

  • No "Adresses field" /CIDR block in cloud-sql-instance service entry
  • Resolution: DNS for the service entry that allows the connection to the CloudSQL Instance