Multithreading/Parallel Processing in PHP

2020-02-13 02:45发布

问题:

I have a PHP script that will generate a report using PHPExcel from data queried from a MySQL DB. Currently, it is linear in processing in that it gets the data back from MySQL, reads in the Excel template, writes the data to the template, then outputs it. I have optimized the code to the point that the data is only iterated over once, and there is very little processing done on the PHP side. The query returns hundreds of lines in less than .001 seconds, so it is running fast enough. After some timing I have found my bottlenecks to be (surprise, surprise) reading the template and writing the output. I would like to do this:

Spawn a thread/process to read the template
Spawn a thread/process to fetch the data
Return back to parent thread - Parent thread will wait until both are complete
Proceed on as normal

My main questions are is this possible, is it worth it? If yes to both, how would you tackle it? Also, it is PHP 5 on CentOS

回答1:

It is generally not a good idea to fork an Apache process. That can cause undetermined results. Instead, using some kind of queuing mechanism is preferable. Gearman is an open source queuing mechanism you can use. I also have a blog post on the Zend Server Job Queue that talks about running tasks asynchronously Do you queue? Introduction to the Zend Server Job Queue.

You could also use something like the Zend Framework Queuing classes to implement some of the asynchronous work. Zend_Queue

@Swisstack, also I will disagree with your assertion that PHP is not created for high performance. Very seldom are language features the cause of slow performance. Perhaps by doing a raw language test comparing $a++ among different languages you will see that, but that type of testing is irrelevant. I've done consulting on PHP for several years and I have never seen a performance problem that was due to the language.



回答2:

I would try to figure out if you can cache or store the template in some faster to read format. I don't know if that's possible, but the PHPExcel forum is pretty good and is watched by the developers.



回答3:

You can't multithread but you can fork (pcntl_fork, pcntl_wait). As I'm sure know, you'll want to test carefully the process spawn times to make sure that this is even worth it for your situation.

$pid = pcntl_fork();

if ($pid == -1) {
  // fork failed

} elseif ($pid > 0) {
  // we're the parent! Wait for child to finish
  pcntl_waitpid($pid);

} else {
  // we're the child
}


回答4:

If both reading the template, AND the db query were slow, then I'd say there's a decent chance that worthwhile performance could be gained by running the tasks in parallel. But, you said it yourself, reading the template is slow, and the db query is fast. So, even ignoring any additional overhead created by introduced by the additions needed to run the tasks in parallel, in the best case, you stand to save 0.001 seconds(the time needed for db query).

Running multiple tasks in parallel will always still require the time of the slowest task. Running tasks in series is the sum of all tasks. In your case, templateTime + queryTime(0.001)

Not worth it imo.

Usually the database is the turtle in the equation. You can do that part async without too much effort. See the newly added mysqli_poll() and friend functions.



回答5:

You can definitely spawn processes on CentOS with PHP (http://php.net/manual/en/function.pcntl-fork.php). Before doing that though, I'd consider at least one thing... If bottleneck appears to be on reading the template and writing the output, it might be an I/O bound issue only and therefore dealing with multiple processess might not help much... Personally I'd try to see if it's possible to do some caching instead...



回答6:

Read the template once, then do a clone for each workbook that you need to create from the data