I have a function that needs to go over around 20K rows from an array, and apply an external script to each. This is a slow process, as PHP is waiting for the script to be executed before continuing with the next row.
In order to make this process faster I was thinking on running the function in different parts, at the same time. So, for example, rows 0 to 2000 as one function, 2001 to 4000 on another one, and so on. How can I do this in a neat way? I could make different cron jobs, one for each function with different params: myFunction(0, 2000)
, then another cron job with myFunction(2001, 4000)
, etc. but that doesn't seem too clean. What's a good way of doing this?
Not sure if a solution for your situation but you can redirect the output of system calls to a file, thus PHP will not wait until the program is finished. Although this may result in overloading your server.
http://www.php.net/manual/en/function.exec.php - If a program is started with this function, in order for it to continue running in the background, the output of the program must be redirected to a file or another output stream. Failing to do so will cause PHP to hang until the execution of the program ends.
If you'd like to execute parallel tasks in PHP, I would consider using Gearman. Another approach would be to use pcntl_fork(), but I'd prefer actual workers when it's task based.
Have a look at pcntl_fork. This allows you to spawn child processes which can then do the separate work that you need.
you can use "PTHREADS"
very easy to install and works great on windows
download from here -> http://windows.php.net/downloads/pecl/releases/pthreads/2.0.4/
Extract the zip file and then
move the file 'php_pthreads.dll' to php\ext\ directory.
move the file 'pthreadVC2.dll' to php\ directory.
then add this line in your 'php.ini' file:
save the file.
you just done :-)
now lets see example of how to use it:
for more information about PTHREADS read php docs here:
PHP DOCS PTHREADS
if you'r using WAMP like me, then you should add 'pthreadVC2.dll' into \wamp\bin\apache\ApacheX.X.X\bin and also edit the 'php.ini' file (same path) and add the same line as before
extension=php_pthreads.dll
GOOD LUCK!
The only waiting time you suffer is between getting the data and processing the data. Processing the data is actually completely blocking anyway (you just simply have to wait for it). You will not likely gain any benefits past increasing the number of processes to the number of cores that you have. Basically I think this means the number of processes is small so scheduling the execution of 2-8 processes doesn't sound that hideous. If you are worried about not being able to process data while retrieving data, you could in theory get your data from the database in small blocks, and then distribute the processing load between a few processes, one for each core.
I think I align more with the forking child processes approach for actually running the processing threads. There is a brilliant demonstration in the comments on the pcntl_fork doc page showing an implementation of a job daemon class
http://php.net/manual/en/function.pcntl-fork.php