I am trying to create a simple web app using Python on GAE. The app needs to spawn some threads per request received. For this I am using python's threading library. I spawn all the threads and then wait on them.
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
The application runs fine except for the fact that the threads are running serially rather than concurrently(confirmed this by printing the timestamps at the beginning/end of each thread's run() method). I have followed the instructions given in http://code.google.com/appengine/docs/python/python27/using27.html#Multithreading to enable multithreading
My app.yaml looks like:
application: myapp
version: 1
runtime: python27
api_version: 1
threadsafe: true
handlers:
- url: /favicon\.ico
static_files: favicon.ico
upload: favicon\.ico
- url: /stylesheet
static_dir: stylesheet
- url: /javascript
static_dir: javascript
- url: /pages
static_dir: pages
- url: .*
script: main.app
I made sure that my local GoogleAppLauncher uses python 2.7 by setting the path explicitly in the preferences.
My threads have an average run-time of 2-3 seconds in which they make a url open call and do some processing on the result.
Am I doing something wrong, or missing some configuration to enable multithreading?
Are you experiencing this in the dev_appserver or after uploading your app to the production service? From your mention of GoogleAppLauncher it sounds like you may be seeing this in the dev_appserver; the dev_appserver does not emulate the threading behavior of the production servers, and you'd be surprised to find that it works just fine after you deploy your app. (If not, add a comment here.)
Another idea: if you are mostly waiting for the urlfetch, you can run many urlfetch calls in parallel by using the async interface to urlfetch: http://code.google.com/appengine/docs/python/urlfetch/asynchronousrequests.html
This approach does not require threads. (It still doesn't properly parallelize the requests in the dev_appserver; but it does do things properly on the production servers.)
If your threads are mostly waiting for datastore operations, you may try the NDB module that's part of 1.6.2. The semantics will be close enough to what you are doing.
IIRC, the multithreading flag enables one server instance to serve multiple requests on separate threads, but won't allow you to start threads yourself. If you didn't need to sync them before returning, you could put them on separate tasks and delegate them to one or more task queues.
The multithreading notes for GAE are merely for how requests are handled - they don't fundamentally change how Python threads work. Specifically, the "CPython Implementation Detail" note in the threading module docs still applies.
It's also worth mentioning the note in the "Sandboxing" section of the GAE docs: