I was experimenting with concurrent request handling on few platforms.
The aim of the experiment was to have a broad measure of the capacity bounds of some selected technologies.
I set up a Linux VM on my machine with a basic Go http server (the vanilla http.HandleFunc
of the http
default package).
The server would then compute a modified version of the fasta algorithm that restricted threads and processes to 1, and return the result. N was set to 100000.
The algorithm runs in roughly 2 seconds.
I used the same algorithm and logic on a Google App Engine project.
The algorithm is written using the same code, just the handler set up is done on init()
instead of main()
as per GAE requirements.
On the other end an Android client is spawning 500 threads each one issuing in parallel a GET
request to the fasta computing server, with a request timeout of 5000 ms.
I was expecting the GAE application to scale and answer back to each request and the local Go server to fail on some of the 500 requests but results were the opposite: the local server correctly replied to each request within the timeout bounds while the GAE application was able to handle just 160 requests out of 500. The remaining requests timed out.
I checked on the Cloud Console and I verified that 18 GAE instances were spawned, but still the vast majority of requests failed.
I thought that most of them failed because of the start-up time of each GAE instance, so I repeated the experiment right after but I had the same results: most of the requests timed out.
I was expecting GAE to scale to accomodate ALL the requests, believing that if a single local VM could successfully reply to 500 concurrent requests GAE would have done the same, but this is not what happened.
The GAE console doesn't show any error and correctly reports the number of incoming requests.
What could be the cause of this? Also, if a single instance could handle all the incoming requests on my machine by virtue of only goroutines, how come that GAE needed to scale so much at all?
Extending on Alexander's answer.
The GAE scaling logic is based on incoming traffic trend analysis.
The key for being able to handle your case - sudden spikes in traffic (which can't be takes into account in the trend analysis due to its variation speed) - is to have sufficient resident (idle) instances configured for your application to handle such traffic until GAE spins up additional dynamic instances. It can handle as high peaks as you want (if your pockets are deep enough).
See Scaling dynamic instances for more details.
To make optimal usage in terms of minimizing costs you need to configure few things in
app.yaml
:threadsafe: true
- actually it's from Python config and not applicable to Go but I would set it just in case.max_concurrent_requests
- set to maximum 80max_idle_instances
- set to minimum 0max_pending_latency
- set it to automatic or greater thenmin_pending_latency
min_idle_instances
- set it to 0min_pending_latency
- set to higher number. If you are OK to get 1 second latency and you handlers take on average 100ms to process set it to 900ms.Then you should be able to proceed a lot of request on single instance.
If you OK to burn cash for the sake of responsiveness & scalabiluty - increase
min_idle_instances
&max_idle_instances
.Also do you use similar instance types for VM and GAE? The GAE F1 instance is not too fast and is more optimal for async tasks like working with IO (datastore,http,etc.). You can configure usage of more powerful instance to better scale for computation intensive tasks.
Also do you test on paid account? Free accounts have quotas and AppEngine would refuse percentage of requests if it believe the load would exceed the daily quota if continuous with the same pattern.
Thanks everyone for their help. Many interesting points and insights have been made by the answers I had on this topic.
The fact the the Cloud Console were reporting no errors led me to believe that the bottleneck was happening after the real request processing.
I found the reason why the results were not as expected: bandwidth.
Each response had a payload of roughly 1MB and thus responding to 500 simultaneous connections from the same client would clog the lines, resulting in timeouts. This was obviously not happening when requesting to the VM, where the bandwith is much larger.
Now GAE scaling is in line with what I expected: it successfully scales to accomodate each incoming request.