I have a paid app on google app engine. As you all know, google will charge you for your instance hours.
The only thing I was not able to understand is, what is the logic that causes google to warm another instance, or how do they decide how to load balancing the traffic between those instances.
As you can see from the screenshot of the appengine instances screen (sorry for the link, I am new to stackoverflow, and was not allowed to post an actual image), in order to make sure that my users won't suffer from a long load request, I have one instance resident at all times.
The funny thing about it is, that it appears that none of the traffic is getting to the resident instance, all the traffic is actually going to one of the dynamic instances. Moreover, let's assume that according to their load balancing algorithm that dynamic instance is overwhelmed, instead of directing traffic to the resident instance, they warmed up another dynamic instance which appears to not get that much traffic either.
If I was not paying for the tripple instance hours, I wouldn't care. Unfortunately, I need to pay for these hours :)
I will appreciate it if anyone can share some more light regarding the following:
1. How does the load balancing of GAE works?
2. What can I do to get a better distribution of the traffic on my instances (and by that reduce the amount of dynamic instances at a given time)
Thanks for the help!
This documentation page explains how the App Engine scheduler works. And this other page teaches what parameters can be changed to control performances.
However we are working on improving the scheduling behavior. Stay tuned by following the Google Cloud Platform blog. (I work on App Engine)