I have a highly trafficked application on one debian machine and apache has started acting strange.
Every time I start apache, tons of apache processes are spawned, the app doesn't load at all, and very quickly the whole machine freezes and must be powercycled to reboot.
Here is what I get for top immediately after starting apache:
top - 20:14:44 up 1:16, 2 users, load average: 0.48, 0.10, 0.03
Tasks: 330 total, 5 running, 325 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.0%us, 21.4%sy, 0.0%ni, 65.7%id, 0.2%wa, 0.1%hi, 0.7%si, 0.0%st
Mem: 8179920k total, 404984k used, 7774936k free, 60716k buffers
Swap: 2097136k total, 0k used, 2097136k free, 43424k cached
10251 www-data 15 0 467m 8100 4016 S 6 0.1 0:00.04 apache2
10262 www-data 15 0 467m 8092 4012 S 6 0.1 0:00.05 apache2
10360 www-data 15 0 468m 8296 4016 S 6 0.1 0:00.05 apache2
10428 www-data 15 0 468m 8272 3992 S 6 0.1 0:00.05 apache2
10241 www-data 15 0 467m 8256 4012 S 4 0.1 0:00.03 apache2
10259 www-data 15 0 467m 8092 4012 S 4 0.1 0:00.04 apache2
10274 www-data 15 0 467m 8056 4012 S 4 0.1 0:00.03 apache2
10291 www-data 15 0 468m 8292 4012 S 4 0.1 0:00.03 apache2
10293 www-data 15 0 468m 8292 4012 S 4 0.1 0:00.03 apache2
10308 www-data 15 0 468m 8296 4016 S 4 0.1 0:00.02 apache2
10317 www-data 15 0 468m 8292 4012 S 4 0.1 0:00.02 apache2
10320 www-data 15 0 468m 8292 4012 S 4 0.1 0:00.04 apache2
10325 www-data 15 0 468m 8292 4012 S 4 0.1 0:00.04 apache2
And so forth.. with more apache2 processes.
Less than a minute later, you can see below that the load has gone from 0.48 to 2.17. If I do not stop apache at this point, the load continues to rise over a few minutes or less until the machine dies.
top - 20:15:34 up 1:17, 2 users, load average: 2.17, 0.62, 0.21
Tasks: 1850 total, 5 running, 1845 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 2.1%sy, 0.0%ni, 96.4%id, 0.0%wa, 0.1%hi, 1.0%si, 0.0%st
Mem: 8179920k total, 1938524k used, 6241396k free, 60860k buffers
Swap: 2097136k total, 0k used, 2097136k free, 44196k cached
We have a firewall where we whitelist the addresses we know are allowed to hit our site.
Any ideas about what the problem might be are very welcome.
Thanks!
You have probably made the error of configuring Apache to use far more than all of your ram. This is an easy mistake to make.
I am assuming you are using a Prefork Apache, and an in-process application server (such as PHP or mod_perl). In this model, you will end up with a maximum of (MaxClients * max memory usage of your application per process) memory used. If you don't have nearly that much, it's time to decrease one, the other or both.
In the general case, this means decreasing MaxClients to the point where your server has enough ram to cope.
The default values typically used for MaxClients (150 is typical) are not suitable for running an in-process heavyweight application server on a modest machine if you are using the Prefork model (Most application servers either don't support, or discourage, the use of threaded models).
However, decreasing MaxClients will eventually cause the application to become unavailable, particularly if you have keepalives on and the keepalive timeout too long. Processes which are just keeping a connection alive (state K in server-status) still use a lot of RAM, and that may be a problem - try to minimise keepalive timeout, or turn it off altogether.
You need to keep an eye on server-status (as provided by mod_status).
Of course you should only make ANY of these changes if you understand the consequences. Think twice, change the config once. If you have ANY ability to test the changes with simulated load on a similar spec non-production machine, do so.
use ps -aux | grep apache to find out the number of processes that apache is running on. Look out for the "RSS" column which gives an estimate of the memory used by each process. Alternatively you can use "top", where you shift + f and then select the %MEM column to sort the processes by memory usage.
The number of processes is determined by "MaxClients" directive in your apache.conf file. The way you come to this figure is as described by this page;
- SSH into your server as root.
- Run top.
- Press shift + m.
- Note the highest RES memory used by httpd.
- Hit Q to exit top.
- Execute: service httpd stop (In debian,
sudo service apache2 stop
)
- Once httpd is stopped, execute: free -m
- Note the memory listed under "used".
- Find the guaranteed memory for your VPS plan. Support can tell you how much you have guaranteed if you cannot find it.
- Subtract the memory USED from the memory that your plan is GUARANTEED. This will give you your base FREE MEMORY POOL.
- Multiply the value of your FREE MEMORY POOL by 0.8 to find your average AVAILABLE APACHE POOL (this will allow you a 20% memory reserve for burst periods).
- Divide your AVAILABLE APACHE POOL by the highest RES memory used by httpd. This will give you the MaxClients value that should be set for your system. (Round it to the nearest integer less than this value if it has a fraction component.)
The right value for "MaxClients" will ensure the right memory allocation for your apache server. That's how I solved it.
In Debian, apache conf file is at /etc/apache2/apache2.conf
Have you changed your configuration file recently? If yes, I trust you keep the old version for diffing?
If not, search for the "StartServers", "MaxSpareServers" and "MinSpareServers" directives. Generally you want to leave these at defaults, but it's possible that they were intentionally set high (bad idea) or accidentally set that way due to a bad config edit.
If this doesn't help, it's time to look outside Apache, for some process that's opening connections at a fast rate (could be that there's a testing process that's run amok).
First step is the access log. Second step is to run netstat, to see where the connections might be coming from. And if it's running on the same system, you can look in /proc/*/fd to find the two ends of the connection.
This question is ancient, but I feel compelled to add an answer here because all of the existing answers are overlooking a key piece of information from the OP: After the load has begun to rise for a few minutes, top
reports that there are still ample CPU & memory resources available. There is usually one culprit remaining, and that's I/O.
Check if there is a full partition with df -h
. If not, see if your application is thrashing the disk using vmstat 1 10
or iostat 1 10
(these are provided by the 'sysstat' package on Debian/Ubuntu). If you still don't see an issue there, perhaps you have device level I/O errors or network trouble for network-mounted storage. Check the system and daemon log files.
As has been said (assuming Prefork Apache) - MaxClients = max processes at once.
If you find you are getting hammered with real traffic (and not a mis-configured StartServers/Min/MaxSpareServers), there are some other things you can do:
- Set up a separate, lightweight
apache process (or lighttpd) for
your static content. That way all
the small, static stuff doesn't
"pollute" your heavy-weight app
process. This can be on the same
server, or a different one. Doesn't
matter.
- Put a reverse proxy like Squid in
front of your Apache process. The
reverse proxy will quickly suck down
the content from Apache and store it
in memory and then parcel it back
out to the client. This way AOL
users on 14.4kb modems don't hog one
of your valuable Apache slots. As a
bonus, such a setup can be
configured to cache some of your
content to reduce the load on your
Apache processes.
Your 'top' output shows that you have plenty of free memory, so I don't think that MaxClients is an issue (unless there is some problem with Apache allocating more than 2GB of memory?) Your error log should show errors if it is having problems creating more children.
Most likely, your Apache processes really are using a lot of resources. If you are running PHP apps, try installing eAccelerator which does a good job of optimizing and caching PHP code. Other things might include heavy MySQL queries, a slow DNS resolver, etc. Beyond that, it gets more into understanding what programs are being hit and what they are doing.