In approximately March 2011 I tested GAE (the Java version) as a potential platform for a massively parallel computation. The date is relevant because GAE is evolving all the time. I found that the application was effectively being throttled at about 43.2X computational throughput. Has anybody successfully used GAE for massively parallel computation or achieved a much higher computational gain? For the purpose of this question, I will arbitrarily define massively parallel computation to mean greater than 1000x computational throughput.
I used a desktop client that instantiated multiple threads to hit the URL. I was using GAE Task Queues. The application required very little input and produced very little output, whether Datastore or HTML, as it was designed to evaluate computational throughput.
Since it is often advised to keep GAE tasks under 1 second (although it is not clear as to whether this recommendation applies to Task Queue tasks) I tried various permutations. Some of my results are included here. As you can see, even with 0.8 second tasks, consistent with the sub 1 second recommendation, throughput peaked at 43.2X.
Elapsed Tasks SecondsOf Total Gain
Seconds Requested WorkPerTask Work
FLT (FEW LARGE TASKS)
15 72 1 72 4.9
103 72 20 1440 14.0
1524 72 400 28800 18.9
MST (MANY SMALL TASKS)
53 1000 0.8 800 15.1
63 2000 0.8 1600 25.4
127 4000 0.8 3200 25.2
313 4000 0.8 3200 10.2
258 8000 0.8 6400 24.8
177 8000 0.8 6400 36.2 (Have 5% of tasks do nothing.)
49 2000 0.8 1600 32.7 (Have 1% of tasks do nothing.)
37 2000 0.8 1600 43.2 (Have 5% of tasks do nothing.)
42 2000 0.8 1600 38.1 (Have 10% of tasks do nothing.)
249 2000 0.8 1600 6.4 (Have 50% of tasks do nothing.)
MLT (MANY LARGE TASKS)
6373 1000 200 200000 31.4
380 200 60 12000 31.6
Note that it was inadvisable to go above 600 seconds for Task Queue tasks so the highest I went was 400 seconds just to leave a margin of safety. The cases where some tasks do nothing was to lower the average amount of work each task had to do in order to influence the overall Google "accounting". So each of, say 2000 tasks, have 0.8 seconds of work but an extra 222 tasks have no work, meaning 10% have no work.
Edit: @PeterRecore, I am measuring the throughput gain and it is totalWorkInSeconds divided by elapsedTimeInSeconds and this is measured at the client. The client makes the requests and measures the elapsed time until all the GAE tasks finish which is indicated by each sending a trivially small response. I am trying to find out if GAE in its current form can be used to create an application that achieves high values of throughput gain. In March 2011 it seemed not likely. What about today? and how would it be done or how did you actually do it? what level of throughput gain was achieved? As I said Datastore use is minimal and consists of each task writing a single trivially small object when a task is done. Each task loops to an integer proportional to secondsOfWorkPerTask. GAE spinning up instances is part of the problem. Google sort of worsens this problem by telling people that they prefer sub-second tasks. The problem is mitigated if I have large tasks because then instantiation is a smaller percentage of the number of cycles used.
App Engine really isn't designed for use as a backend for huge computing jobs - it's designed for fast efficient serving of scalable sites (and APIs, for that matter). What it does isn't optimized around what you're trying to achieve.