php multithreading problem [closed]

2019-09-08 15:52发布

问题:

I am writing a php cron job that reads thousands of feeds / web pages using curl and stores the content in a database. How do I restrict the number of threads to, lets say, 6? i.e., even though I need to scan thousands of feeds / web pages, I want only 6 curl threads active at any time so that my server and network don't get bogged down. I could do it easily in Java using wait, notify, notifyall methods of Object. Should I build my own semaphore or does php provide any built-in functions?

回答1:

First of all, PHP doesn't have threads, but it does have process control: http://php.net/manual/en/book.pcntl.php

I've built a class around these functions to help with my multi-process requirements.

I'm in a similar situation. I'm keeping a log of the processes that get started from cron and their state. I'm checking on them from a related cron job.

EDIT (more details):

In my project I log all the key changes to the database. Actions may then be taken if the changes meet the actions criterion. So what I'm doing is different to you. However, there are some similarities.

When I fork a new process, I enter it's pid in a DB table. Then next time the cron job kicks in, part of what it does is check to see if the processes have completed properly, and then mark the action as completed in that DB table.

You don't give many details about your project. So I will just throw out a suggestion:

  • A DB table holds the URLs of the resources you want to download.
  • Another table holds the pids of the running processes.
  • A cron job that is run every hour will go through the table and download the resource and store it in a DB. However, first it checks the pid table for complete/dead/running processes and acts accordingly. Here you can limit your processes to 6.

Depending on the size of your project, this may seem like over kill. However, I've thought about it for a long long time, and I want to keep track of all those forked processes. Forking can be risky business, and can lead to system resource overload - speaking from experience ;)

I'd be interested to hear other techniques as well.



回答2:

From my reply at PHP using proc_open so that it doesn't wait for the script it opens (runs) to finish?

Some of my code when i played around with proc_open
I had issues with proc_close (10 to 30 seconds) so i just killed the process using linux command kill

Curl sometimes freezez for me on various servers (ubuntu, centos) but not on all of them, so i kill any "child" processes that take over 40 seconds because normally the script would take 10 second at maximum and i'd rather redo the work than wait a minute or so for curl to un-freeze.



$options=array();
$option['sleep-after-destroy']=0;
$option['sleep-after-create']=0;
$option['age-max']=40;
$option['dir-run']=dirname(__FILE__);
$option['step-sleep']=1;
$option['workers-max']=6;
$option['destroy-forcefull']=1;

$workers=array();

function endAWorker($i,$cansleep=true) {
        global $workers;
        global $option;
        global $child_time_limit;
        if(isset($workers[$i])) {
                @doE('Ending worker [['.$i.']]'."\n");
                if($option['destroy-forcefull']==1) {
                        $x=exec('ps x | grep "php check_working_child.php '.$i.' '.$child_time_limit.'" | grep -v "grep" | grep -v "sh -c"');
                        echo 'pscomm> '.$x."\n";
                        $x=explode(' ',trim(str_replace("\t",' ',$x)));
                        //print_r($x);
                        if(is_numeric($x[0])) {
                                $c='kill -9 '.$x[0];
                                echo 'killcommand> '.$c."\n";
                                $x=exec($c);
                        }
                }
                @proc_close($workers[$i]['link']);
                unset($workers[$i]);
        }
        if($cansleep==true) {
                sleep($option['sleep-after-destroy']);
        }
}

function startAWorker($i) {
        global $workers;
        global $option;
        global $child_time_limit;

        $runcommand='php check_working_child.php '.$i.' '.$child_time_limit.' > check_working_child_logs/'.$i.'.normal.log';
        doE('Starting [['.$i.']]: '.$runcommand."\n");
        $workers[$i]=array(
                'desc' => array(
                        0 => array("pipe", "r"),
                        1 => array("pipe", "w"),
                        2 => array("file", 'check_working_child_logs/'.$i.'.error.log', "a")
                        ),
                'pipes'                 => null,
                'link'                  => null,
                'start-time'    => mktime()
                );
        $workers[$i]['link']=proc_open(
                $runcommand,
                $workers[$i]['desc'],
                $workers[$i]['pipes'],
                $option['dir-run']
                );
        sleep($option['sleep-after-create']);
}

function checkAWorker($i) {
        global $workers;
        global $option;
        $temp=proc_get_status($workers[$i]['link']);
        if($temp['running']===false) {
                doE('Worker [['.$i.']] finished'."\n");
                if(is_file('check_working_child_logs/'.$i.'.normal.log') && filesize('check_working_child_logs/'.$i.'.normal.log')>0) {
                        doE('--------'."\n");
                        echo file_get_contents('check_working_child_logs/'.$i.'.normal.log');
                        doE('-------'."\n");
                }
                endAWorker($i);
        } else {
                if($option['age-max']>0) {
                        if($workers[$i]['start-time']+$option['age-max']$v) {
                endAWorker($i,false);
        }
        @doE('Done killing workers.'."\n");
}

register_shutdown_function('endAllWorkers');

while(1) {
        $step++;
        foreach($workers as $index=>$v) {
                checkAWorker($index);
        }
        if(count($workers)==$option['workers-max']) {
        } elseif(count($workers)$option['workers-max']) {
                $wl=array_keys($workers);
                $wl=array_pop($wl);
                doE('Killing worker [['.$wl.']]');
                endAWorker($wl[0]);
        }
}

And create a file named 'check_working_child.php' to do all the work, the first parameter will be the instance number and the second the time limit php check_working_child.php 5 60 means you are the 5th child and are allowed to run 60 seconds

If the above code does not run let me know, i will post it using pastebin or something...