Can Google App Engine be used for massively parall

2020-02-29 06:22发布

In approximately March 2011 I tested GAE (the Java version) as a potential platform for a massively parallel computation. The date is relevant because GAE is evolving all the time. I found that the application was effectively being throttled at about 43.2X computational throughput. Has anybody successfully used GAE for massively parallel computation or achieved a much higher computational gain? For the purpose of this question, I will arbitrarily define massively parallel computation to mean greater than 1000x computational throughput.

I used a desktop client that instantiated multiple threads to hit the URL. I was using GAE Task Queues. The application required very little input and produced very little output, whether Datastore or HTML, as it was designed to evaluate computational throughput.

Since it is often advised to keep GAE tasks under 1 second (although it is not clear as to whether this recommendation applies to Task Queue tasks) I tried various permutations. Some of my results are included here. As you can see, even with 0.8 second tasks, consistent with the sub 1 second recommendation, throughput peaked at 43.2X.

Elapsed    Tasks        SecondsOf     Total   Gain
Seconds    Requested    WorkPerTask   Work 

FLT (FEW LARGE TASKS)
15         72           1             72      4.9
103        72           20            1440    14.0
1524       72           400           28800   18.9

MST (MANY SMALL TASKS)
53         1000         0.8           800     15.1
63         2000         0.8           1600    25.4
127        4000         0.8           3200    25.2
313        4000         0.8           3200    10.2
258        8000         0.8           6400    24.8

177        8000         0.8           6400    36.2 (Have 5% of tasks do nothing.)

49         2000         0.8           1600    32.7 (Have 1% of tasks do nothing.)
37         2000         0.8           1600    43.2 (Have 5% of tasks do nothing.)
42         2000         0.8           1600    38.1 (Have 10% of tasks do nothing.)
249        2000         0.8           1600    6.4  (Have 50% of tasks do nothing.)

MLT (MANY LARGE TASKS)
6373       1000         200           200000  31.4
380        200          60            12000   31.6

Note that it was inadvisable to go above 600 seconds for Task Queue tasks so the highest I went was 400 seconds just to leave a margin of safety. The cases where some tasks do nothing was to lower the average amount of work each task had to do in order to influence the overall Google "accounting". So each of, say 2000 tasks, have 0.8 seconds of work but an extra 222 tasks have no work, meaning 10% have no work.

Edit: @PeterRecore, I am measuring the throughput gain and it is totalWorkInSeconds divided by elapsedTimeInSeconds and this is measured at the client. The client makes the requests and measures the elapsed time until all the GAE tasks finish which is indicated by each sending a trivially small response. I am trying to find out if GAE in its current form can be used to create an application that achieves high values of throughput gain. In March 2011 it seemed not likely. What about today? and how would it be done or how did you actually do it? what level of throughput gain was achieved? As I said Datastore use is minimal and consists of each task writing a single trivially small object when a task is done. Each task loops to an integer proportional to secondsOfWorkPerTask. GAE spinning up instances is part of the problem. Google sort of worsens this problem by telling people that they prefer sub-second tasks. The problem is mitigated if I have large tasks because then instantiation is a smaller percentage of the number of cycles used.

1条回答
Root(大扎)
2楼-- · 2020-02-29 07:10

App Engine really isn't designed for use as a backend for huge computing jobs - it's designed for fast efficient serving of scalable sites (and APIs, for that matter). What it does isn't optimized around what you're trying to achieve.

查看更多
登录 后发表回答