I have recently encountered an issue with php-fpm processes usage (As in the amount of active processes) peaking to the maximum available processes and by that stopping execution of other scripts until the problematic processes finish.
In a bit more detail,
my current php-fpm settings are:
pm = static
pm.max_children = 100
I am watching the php-fpm's status page, which most of the time shows:
total processes: 100
idle processes: 95-99
active processes: 1-5
which is normal.
However, ever few minutes the active processes count jumps to 100 for a few seconds, and then goes back to normal of 1-5. in that time all other scripts running on the server are simply stuck for that period of time. (From the browser you simply see the page waiting).
Now, i have checked to see if its in specific traffic spikes, but its not. it can also occur with the lowest traffic count of the day.
I believe that a certain script, maybe even only in specific situations, is causing php to simply use all available processes for some reason.
This issue started once we moved to PHP 5.4.X from 5.2.X
We currently have around 60 websites, so its kinda hard going through each website's pages and checking it.
There is nothing in the nginx logs (Nothing critical anyway, there are a few Notices and such).
What i'm trying to do, is somehow trace/profile/monitor which php-fpm script is using the processes so i will know where to start looking for the problem.
Is this possible? Maybe a different approach?
Update
Here is a graph of the PHP-FPM process count in 1 hour, in 1 min intervals:
I have marked in red the jumps that i'm talking about.
The memory usage at the time of the spike stays the same
In your php-fpm log file you should be able to see something like:
WARNING: [pool www-images] server reached pm.max_children setting (5), consider raising it.
for when the number of active processes hits the limits. You should be able to correlate that with the requests that are coming in.
If that doesn't show any pattern of which requests are causing the issue then you should add slow logging to your php-fpm config:
request_slowlog_timeout = 10
slowlog = /var/log/php-fpm/slow.$pool.log
The will log a stack trace for each request that takes up more than the slowlog_timeout limit.
If that still doesn't show anything then either your internal application logging should show where the slow down occurs.
If that doesn't have enough detail then you could use strace as a last resort, which will show which system calls are being made. That will produce a torrent of information. I'd recommend only attaching it to a single process strace -p PID
where PID is the processID of a php-fpm instance.
it can also occur with the lowest traffic count of the day.
That should definitely show up in the php-fpm slow logging. However if that only shows you what request are slow, but doesn't help you figure out why, you can add debugging using the auto pre and post-pend files, in your PHP-FPM config file.
php_value[auto_prepend_file]=/php_shared/prepend.php
php_value[auto_append_file]=/php_shared/postpend.php
Or really simply
You can set up the PHP-FPM status page.
Add this to your PHP-FPM pool config:
pm.status_path = /www-status
And pass the requests through nginx to PHP-FPM
location ~ ^/(www-status)$ {
include %mysite.root.directory%/conf/fastcgi.conf;
fastcgi_pass unix:%phpfpm.socket%/php-fpm-www.sock;
# or IP address
# fastcgi_pass 127.0.0.1:9000;
#If you're fastcgi.conf doesn't set the query_string
#pass the query string here instead.
# fastcgi_param QUERY_STRING $query_string;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
allow 127.0.0.1;
allow stats_collector.localdomain;
allow watchdog.localdomain;
deny all;
}
Then going to yoursite.com/www-status?full will give you a big print out of every php-fpm process like:
pool: www
process manager: dynamic
start time: 18/Mar/2013:20:17:21 +1100
start since: 243
accepted conn: 3
listen queue: 0
max listen queue: 0
listen queue len: 0
idle processes: 3
active processes: 1
total processes: 4
max active processes: 1
max children reached: 0
slow requests: 0
************************
pid: 6233
state: Idle
start time: 18/Mar/2013:20:17:21 +1100
start since: 243
requests: 1
request duration: 631
request method: GET
request URI: /www-status
content length: 0
user: -
script: /documents/projects/intahwebz/intahwebz/basereality/www-status
last request cpu: 0.00
last request memory: 262144
btw I bet it some silly query that's locking up your database.