I have a small java app (2-10 qps) that is set to automatic scaling with F4_1G instances. Interesting while normally only one instance is really active, normally there are two instances created. Sometimes, after a few hours, an instance disappears and is immediately replaced by another 1-2 instances, with a corresponding instance load which spikes latency a lot. Is there any way to find out why an instance is shot down? I dont see any _ah/stop (which I think is normal for automatic scaling) or any messages about exceeding memory limits / moving to another system or any other errors, just big latencies when the change happens. Also, the instances are using around 250MB memory, which is a lot less than 1GB. Also, latencies are very low (average 80ms).
I also tried with basic scaling, where there are less restart, but there are also some happening. I can see the _ah/stop there, but still no error messages of why it was stopped (eg, was searching the log for "move" "exceed" "memory").
From what I could find here on stack, I could not really see where this would pop up, it would be in the log, right? Any other ideas of how to figure out what the problem could be?
I ran into the same issue a few months back. Instances were being shut down and new instances were spawned, even though the CPU and memory usages were within bounds, and there were no particular issue with the instance itself or its response latencies, and no traffic spikes. After much observation and research, I noticed that the instances were being restarted after having served 50000 requests (or very little more).
There seems to be an undocumented hard limit on the number of requests served by an instance before it is restarted, in my case 50000 (with F4 or F4_1G instance on app engine java standard). Others have come to the same conclusion (see here for instance).
Probably too late for you after two years, but I hope this helps others that might end up here in the future.