Significant overhead on Django apache vs. built-in

I'm running Django/Tastypie on a soon-to-be production environment, however I am noticing significant overhead using Apache vs. using the built-in dev server. Apache is MUCH slower.

Here are non-scientific bandwidth tests using ab:

Apache:

$ ab -n 100 -c 50 https://www.mydomain.com/api/v1/clock/?format=json
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking www.mydomain.com (be patient).....done


Server Software:        Apache/2.2.22
Server Hostname:        www.mydomain.com
Server Port:            443
SSL/TLS Protocol:       TLSv1/SSLv3,AES256-SHA,2048,256

Document Path:          /api/v1/clock/?format=json
Document Length:        295 bytes

Concurrency Level:      50
Time taken for tests:   1.313 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Non-2xx responses:      100
Total transferred:      49800 bytes
HTML transferred:       29500 bytes
Requests per second:    76.15 [#/sec] (mean)
Time per request:       656.634 [ms] (mean)
Time per request:       13.133 [ms] (mean, across all concurrent requests)
Transfer rate:          37.03 [Kbytes/sec] received

Connection Times (ms)
min  mean[+/-sd] median   max
Connect:       15  283 162.5    282     590
Processing:    55  324 148.4    306     622
Waiting:       55  319 150.2    305     621
Total:        110  607 138.8    619     712

Percentage of the requests served within a certain time (ms)
50%    619
66%    691
75%    692
80%    701
90%    709
95%    709
98%    711
99%    712
100%    712 (longest request)

Dev Server (manage.py runserver):

$ ab -n 100 -c 50 http://127.0.0.1:8000/api/v1/clock/?format=json
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient).....done


Server Software:        WSGIServer/0.1
Server Hostname:        127.0.0.1
Server Port:            8000

Document Path:          /api/v1/clock/?format=json
Document Length:        381 bytes

Concurrency Level:      50
Time taken for tests:   0.701 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      54500 bytes
HTML transferred:       38100 bytes
Requests per second:    142.59 [#/sec] (mean)
Time per request:       350.656 [ms] (mean)
Time per request:       7.013 [ms] (mean, across all concurrent requests)
Transfer rate:          75.89 [Kbytes/sec] received

Connection Times (ms)
min  mean[+/-sd] median   max
Connect:        0    1   1.9      0       7
Processing:    43   73  47.0     63     365
Waiting:       31   70  47.0     61     365
Total:         50   74  47.0     64     365

Percentage of the requests served within a certain time (ms)
50%     64
66%     67
75%     69
80%     71
90%     77
95%    101
98%    276
99%    365
100%    365 (longest request)

As you can see, at smaller load, the dev server is about 10 times faster. Even at higher load, it is handling twice as many requests.

I've done th basic modifications to Apache to try to solve this issue, which seemed to help a bit, but is there something else I'm missing? The 'clock' I'm requesting is a very basic script with one straight database call, so nothing funky with joins or anything going on. It's using Tastypie so output is in straight text/json. Something doesn't seem right because requests with dev server are drastically faster.

Here are my Apache settings. It's setup on worker MPM in daemon mode:

KeepAlive Off

<IfModule mpm_worker_module>
    StartServers         25
    MinSpareThreads      25
    MaxSpareThreads     300
    ThreadLimit          64
    ThreadsPerChild      25
    MaxClients          300
    MaxRequestsPerChild   0
</IfModule>

WSGIRestrictEmbedded On

Virtual Host additions:

    WSGIDaemonProcess www.mydomain.com processes=4 threads=1
    WSGIProcessGroup www.mydomain.com
    WSGIScriptAlias / /var/www/domain/wsgi.py process-group=www.mydomain.com application-group=%{GLOBAL}
    WSGIPassAuthorization On

Python/Tastypie settings:

Debug = False
USE_I18N = False
USE_X_FORWARDED_HOST = True

It's running on a load-balanced AWS medium instance and this server isn't serving any static files such as images/css/js. I tried upping this on IOPS/x-large instance but there was no change. Database is on Amazon RDS. But all of that is the same when running the dev server, which tells me the hosting environment isn't the issue.

Any help would be greatly appreciated!! I'm really not worried too much about high load at this time. It's a JSON-based API, so all requests are text and pretty small. I'm most concerned about response times from a high level of small requests.

Thanks! Mark

EDIT:

I did a new ab test on apache, mapping the dns to localhost. This is essentially the same as mapping it to 127.0.0.1. This gives MUCH better results:

$ ab -n 100 -c 50 http://www.mydomain.com/api/v1/clock/?format=json
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking www.mydomain.com (be patient).....done


Server Software:        Apache/2.2.22
Server Hostname:        www.mydomain.com
Server Port:            80

Document Path:          /api/v1/clock/?format=json
Document Length:        381 bytes

Concurrency Level:      50
Time taken for tests:   0.666 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      55900 bytes
HTML transferred:       38100 bytes
Requests per second:    150.22 [#/sec] (mean)
Time per request:       332.841 [ms] (mean)
Time per request:       6.657 [ms] (mean, across all concurrent requests)
Transfer rate:          82.01 [Kbytes/sec] received

Connection Times (ms)
min  mean[+/-sd] median   max
Connect:        0    3   3.0      2       6
Processing:    38  258  92.6    308     357
Waiting:       33  254  92.9    303     354
Total:         44  261  90.6    310     363

Percentage of the requests served within a certain time (ms)
50%    310
66%    321
75%    323
80%    327
90%    336
95%    344
98%    362
99%    363
100%    363 (longest request)

So the initial test was through the external load balancer. These numbers are ok, however, still the first 50% of tests average a 310ms response time. These seem to be comparable to my real-time external tests. However, the django dev server first 50% tests average 64ms, even though the apache server is scaling much better. Are there any suggestions to tweak apache so that it can fall into that range of serving the initial requests much faster? I don't mind scaling horizontally with additional servers, but the request time means everything to me.

回答1:

Have you considered using NGINX? Its allowed us to have significant performance improvement when running with uwsgi.

回答2:

Your Apache MPM configuration is broken in various ways and also way overkill for the number of requests you actually allow to flow through to the mod_wsgi daemon process group where your application is actually running. Under load all you are going to do is create a large amount of backlog and long response times because your Django application is not going to be able to keep up as it is starved of the processes/threads needed to handle the load.

Using ab with only 100 requests for a test is also going to distort results as well, as you are likely going to be measure warmup time for Apache as it creates more worker processes. Initially you will also be counting in loading time for your Django application as well.

I would suggest you watch my two PyCon talks which cover Apache/mod_wsgi configuration and the use of performance monitoring to work out where bottlenecks are. That may give you some context as to why you are having problems.