Over the past week I've been seeing the number of instances on my GAE Flexible Environment fall to 0, with no new instance spinning up. My understanding of the Flexible environment is that this shouldn't be possible... (https://cloud.google.com/appengine/docs/the-appengine-environments)
I was wondering if anyone else has been seeing these issues, or if they've solved the problem on their end before. My one hypothesis is that this might be an issue with my health monitoring endpoints, but haven't seen anything that jumps out as a problem when I review the code.
This hasn't been a problem for me until last week, and now it seems like I have to redeploy my environment (with no changes) every couple of days just to "reset" the instances. It's worth noting that I have two services under this same App Engine project, both running flexible versions. But I only seem to have this issue with one of the services (what I call the worker service).
Screenshot from App Engine UI:
Screenshot from Logs UI that shows the SIGTERM being sent:
PS - Could this have anything to do with the recent Google Compute issues that have been coming up... https://news.ycombinator.com/item?id=18436187
Edit: Adding the yaml file for "worker" service. Note that I'm using Honcho to add an endpoint to monitor health of the worker service via Flask. I added those code examples as well.
yaml File
service: worker
runtime: python
threadsafe: yes
env: flex
entrypoint: honcho start -f /app/procfile worker monitor
runtime_config:
python_version: 3
resources:
cpu: 1
memory_gb: 4
disk_size_gb: 10
automatic_scaling:
min_num_instances: 1
max_num_instances: 20
cool_down_period_sec: 120
cpu_utilization:
target_utilization: 0.7
Procfile for Honcho
default: gunicorn -b :$PORT main:app
worker: python tasks.py
monitor: python monitor.py /tmp/psq.pid
monitor.py
import os
import sys
from flask import Flask
# The app checks this file for the PID of the process to monitor.
PID_FILE = None
# Create app to handle health checks and monitor the queue worker. This will
# run alongside the worker, see procfile.
monitor_app = Flask(__name__)
@monitor_app.route('/_ah/health')
def health():
"""
The health check reads the PID file created by tasks.py main and checks the proc
filesystem to see if the worker is running.
"""
if not os.path.exists(PID_FILE):
return 'Worker pid not found', 503
with open(PID_FILE, 'r') as pidfile:
pid = pidfile.read()
if not os.path.exists('/proc/{}'.format(pid)):
return 'Worker not running', 503
return 'healthy', 200
@monitor_app.route('/')
def index():
return health()
if __name__ == '__main__':
PID_FILE = sys.argv[1]
monitor_app.run('0.0.0.0', 8080)