In a system I am currently working on, there is one process that loads large amount of data into an array for sorting/aggregating/whatever. I know this process needs optimising for memory usage, but in the short term it just needs to work.
Given the amount of data loaded into the array, we keep hitting the memory limit. It has been increased several times, and I am wondering is there a point where increasing it becomes generally a bad idea? or is it only a matter of how much RAM the machine has?
The machine has 2GB of RAM and the memory_limit is currently set at 1.5GB. We can easily add more RAM to the machine (and will anyway).
Have others encountered this kind of issue? and what were the solutions?
The configuration for the memory_limit
of PHP running as an Apache module to server webpages has to take into consideration how many Apache process you can have at the same time on the machine -- see the MaxClients
configuration option for Apache.
If MaxClients
is 100 and you have 2,000 MB or RAM, a very quick calculation will show that you should not use more than 20 MB *(because 20 MB * 100 clients = 2 GB or RAM, ie the total amount of memory your server has)* for the memory_limit value.
And this is without considering that there are probably other things running on the same server, like MySQL, the system itself, ... And that Apache is probably already using some memory for itself.
Or course, this is also a "worst case scenario", that considers that each PHP page is using the maximum amount of memory it can.
In your case, if you need such a big amount of memory for only one job, I would not increase the memory_limit
for PḦP running as an Apache module.
Instead, I would launch that job from command-line (or via a cron job), and specify a higher memory_limit
specificaly in this one and only case.
This can be done with the -d
option of php, like :
$ php -d memory_limit=1GB temp.php
string(3) "1GB"
Considering, in this case, that temp.php only contains :
var_dump(ini_get('memory_limit'));
In my opinion, this is way safer than increasing the memory_limit for the PHP module for Apache -- and it's what I usually do when I have a large dataset, or some really heavy stuff I cannot optimize or paginate.
If you need to define several values for the PHP CLI execution, you can also tell it to use another configuration file, instead of the default php.ini, with the -c
option :
php -c /etc/phpcli.ini temp.php
That way, you have :
/etc/php.ini
for Apache, with low memory_limit
, low max_execution_time
, ...
- and
/etc/phpcli.ini
for batches run from command-line, with virtually no limit
This ensures your batches will be able to run -- and you'll still have security for your website (memory_limit
and max_execution_time
being security measures)
Still, if you have the time to optimize your script, you should ; for instance, in that kind of situation where you have to deal with lots of data, pagination is a must-have ;-)
Have you tried splitting the dataset into smaller parts and process only one part at the time?
If you fetch the data from a disk file, you can use the fread()
function to load smaller chunks, or some sort of unbuffered db query in case of database.
I haven't checked up PHP since v3.something, but you also could use a form of cloud computing. 1GB dataset seems to be big enough to be processed on multiple machines.
Given that you know that there are memory issues with your script that need fixing and you are only looking for short-term solutions, then I won't address the ways to go about profiling and solving your memory issues. It sounds like you're going to get to that.
So, I would say the main things you have to keep in mind are:
- Total memory load on the system
- OS capabilities
PHP is only one small component of the system. If you allow it to eat up a vast quantity of your RAM, then the other processes will suffer, which could in turn affect the script itself. Notably, if you are pulling a lot of data out of a database, then your DBMS might be require a lot of memory in order to create result sets for your queries. As a quick fix, you might want to identify any queries you are running and free the results as soon as possible to give yourself more memory for a long job run.
In terms of OS capabilities, you should keep in mind that 32-bit systems, which you are likely running on, can only address up to 4GB of RAM without special handling. Often the limit can be much less depending on how it's used. Some Windows chipsets and configurations can actually have less than 3GB available to the system, even with 4GB or more physically installed. You should check to see how much your system can address.
You say that you've increased the memory limit several times, so obviously this job is growing larger and larger in scope. If you're up to 1.5Gb, then even installing 2Gb more RAM sounds like it will just be a short reprieve.
Have others encountered this kind of
issue? and what were the solutions?
I think you probably already know that the only real solution is to break down and spend the time to optimize the script soon, or you'll end up with a job that will be too big to run.